SQL syntax extentions - to put postgres ahead in the race

This is a multi-part message in MIME format.

------=_NextPart_000_01B9_01C47AD6.FBF146A0
Content-Type: text/plain;
charset="windows-1255"
Content-Transfer-Encoding: quoted-printable

Hi,

I am a developer working with many databases, as a part of my job. I use he=
avy SQL queries and have become somewhat of an expert in SQL, including tri=
cks and workarounds of the limitation of the language.

I feel like a lot of the workarounds could be avoided with adding a few new=
operators to the SQL language that should be rather easy to support but wo=
uld give a vast improvement and probably a leap of performance in many comm=
on queries. I write you about this hoping that you would support these oper=
ators (even though they are not in the ANSI) and thereby position PostGres =
as a leader and not just a follower! I personaly have a great interest in s=
eeing open source software making it big time! So please - forward this to =
one of your more technical developers/executives and hopefully I will get t=
o see it in the next version of PostGres:

1) The operator "of max":
suppose I have a table "grades" of 3 fields: class/student/grade where I st=
ore many grades of many students of many classes. I want to get the name of=
the highest scoring student in each class. Note that there may be many stu=
dents with the same grade, but for starters let's say there is a primary ke=
y of class+grade.

My query would now be:
select student from grades where class+'#'+grade in
(
select class+'#'+max(grade) from grades group by class
) a

This means working the query twice - and relying on louzy conversion of the=
grade from numerical to textual.
We could also use:

select student from grades where student in
(
select student from grades group by class
having class+'#'+grade =3D max(class+'#'+grade)
) a

This is even worse!

The optimal would be to introduce a new operator "of max" that would be use=
d as follows:

select student of max(grade) from grades group by class

simillarly one should support "of min" and "of any" (brings a representativ=
e of the group)

2) aggregated concatenation:
Traditionally the SQL language has referred from supporting order dependent=
operators from taking a role in aggregated functions. This means that sinc=
e the query: "select class, grade from grades" does not ensure the order of=
the records returned, the operation sum() is supported (i.e. select class,=
sum(grade) from grades group by class) but other operations that would be =
order dependent are not supported.
I think this approach should be revised. In many cases one would want to ge=
t a list of the student names delimited with a comma. It would be great if =
one could write:
select class, list(student, ',') from grades group by class
and get

class list
----- ----
class1 john, ruth,...




This is of course an order dependent operation so the syntax can either be:
select class, list(student, ',') from grades group by class order by studen=
t, grade
in which case the list would be ordered before the list is created, or if n=
o particular order is requested the concatenation of the names should be in=
an arbitrary order.

Well - that's all for now :-)

Good luck!

Ram

------=_NextPart_000_01B9_01C47AD6.FBF146A0
Content-Type: text/html;
charset="windows-1255"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dwindows-125=
5">
<META content=3D"MSHTML 6.00.2600.0" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hi,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I am a developer working with many databas=
es, as a
part of my job. I use heavy SQL queries and have become somewhat of an expe=
rt in
SQL, including tricks and workarounds of the limitation of the
language.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I feel like a lot of the workarounds could=
be
avoided with adding a few new operators to the SQL language that should be=

rather easy to support but would give a vast improvement and probably a lea=
p of
performance in many common queries. I write you about this hoping that you =
would
support these operators (even though they are not in the ANSI) and thereby=

position PostGres as a leader and not just a follower! I personaly hav=
e a
great interest in seeing open source software making it big time! So please=
-
forward this to one of your more technical developers/executives and hopefu=
lly I
will get to see it in the next version of PostGres:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>1) The operator  "of max":<BR>suppose=
I have a
table "grades" of 3 fields: class/student/grade where I store many grades o=
f
many students of many classes. I want to get the name of the highest scorin=
g
student in each class. Note that there may be many students with the same g=
rade,
but for starters let's say there is a primary key of class+grade.</FONT></D=
IV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>My query would now be:<BR>select student f=
rom
grades where class+'#'+grade in <BR>(<BR>   select
class+'#'+max(grade) from grades group by class<BR>) a</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>This means working the query twice - and r=
elying on
louzy conversion of the grade from numerical to textual.<BR>We could also=

use:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>select student from grades where student i=
n
<BR>(<BR>  select student from grades group by class <BR>  having=

class+'#'+grade =3D max(class+'#'+grade)<BR>) a</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>This is even worse!</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>The optimal would be to introduce a new op=
erator
"of max" that would be used as follows:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>select student of max(grade) from grades g=
roup by
class</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>simillarly one should support "of min" and=
"of any"
(brings a representative of the group)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>2) aggregated concatenation:<BR>Traditiona=
lly the
SQL language has referred from supporting order dependent operators from ta=
king
a role in aggregated functions. This means that since the query: "select cl=
ass,
grade from grades" does not ensure the order of the records returned, the=

operation sum() is supported (i.e. select class, sum(grade) from grades gro=
up by
class) but other operations that would be order dependent are not
supported.<BR>I think this approach should be revised. In many cases one wo=
uld
want to get a list of the student names delimited with a comma. It would be=

great if one could write:<BR>select class, list(student, ',') from grades g=
roup
by class<BR>and get</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>class  list<BR>-----  ----<BR>cl=
ass1
john, ruth,...<BR>.<BR>.<BR>.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>This is of course an order dependent opera=
tion so
the syntax can either be:<BR>select class, list(student, ',') from grades g=
roup
by class order by student, grade<BR>in which case the list would be ordered=

before the list is created, or if no particular order is requested the
concatenation of the names should be in an arbitrary order.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Well - that's all for now :-)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Good luck!</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Ram</FONT></DIV></BODY></HTML>

------=_NextPart_000_01B9_01C47AD6.FBF146A0--
ram_nathaniel [ Do, 05 August 2004 10:28 ] [ ID #453174 ]

Re: SQL syntax extentions - to put postgres ahead in the race

On Thu, 5 Aug 2004, Ram Nathaniel wrote:

>
> 1) The operator "of max":
> suppose I have a table "grades" of 3 fields: class/student/grade where I
> store many grades of many students of many classes. I want to get the
> name of the highest scoring student in each class. Note that there may
> be many students with the same grade, but for starters let's say there
> is a primary key of class+grade.
>
> My query would now be:
> select student from grades where class+'#'+grade in
> (
> select class+'#'+max(grade) from grades group by class
> ) a

As a side note, I'd think that something like:
select student from grades where (class,grade) in
(select class, max(grade) from grades group by class);
should avoid textual operations. I'm assuming the + above are meant to be
concatenation (||).

> The optimal would be to introduce a new operator "of max" that would be used as follows:
>
> select student of max(grade) from grades group by class

PostgreSQL provides an extension called DISTINCT ON.

Something like
select distinct on (class) student from grades order by class, grade
desc;
should get you one arbitrary student with the highest grade in his or her
class.

If you want to order by the grades, I think you need a layer around it.
If you don't care about the class order, you might consider making the
class ordering desc as well to make it easier to use a multi-column index
on (class,grade).

> 2) aggregated concatenation:

Theoretically, you should be able to do this right now in PostgreSQL with
user defined aggregates (although you can't pass a second argument
currently for the separator). I believe that an ordered subquery in FROM
will currently allow you to get an ordered aggregate, or perhaps you'd
have to turn off hash aggregation, but I think you should be able to get
it to keep the ordering.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo [at] postgresql.org
sszabo [ Fr, 06 August 2004 07:22 ] [ ID #453175 ]

Re: SQL syntax extentions - to put postgres ahead in the race

Stephan Szabo <sszabo [at] megazone.bigpanda.com> writes:
> On Thu, 5 Aug 2004, Ram Nathaniel wrote:
>> 2) aggregated concatenation:

> Theoretically, you should be able to do this right now in PostgreSQL with
> user defined aggregates (although you can't pass a second argument
> currently for the separator).

There's nothing particularly stopping us from supporting
multiple-argument aggregates, except a lack of round tuits.
(I suppose we'd want to rethink the syntax of CREATE AGGREGATE,
but otherwise it ought to be pretty straightforward.)

> I believe that an ordered subquery in FROM
> will currently allow you to get an ordered aggregate, or perhaps you'd
> have to turn off hash aggregation, but I think you should be able to get
> it to keep the ordering.

I think you would want to ORDER BY twice:

SELECT class, list(student) from
(select class, student from grades order by class, student) ss
order by class;

It looks like (at least in CVS tip) planner.c will take into account the
relative costs of doing a GroupAgg vs doing a HashAgg and re-sorting,
but I'm too tired to try it right now...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html
tgl [ Fr, 06 August 2004 09:07 ] [ ID #453176 ]

Re: SQL syntax extentions - to put postgres ahead in the race

I wrote:
> There's nothing particularly stopping us from supporting
> multiple-argument aggregates, except a lack of round tuits.

BTW, you can actually fake this pretty well in 8.0, by making an
aggregate that uses a rowtype input. For example:

regression=# create type twostrings as (s1 text, s2 text);
CREATE TYPE
regression=# create function list_concat(text, twostrings) returns text as $$
regression$# select case when $1 is null then $2.s1
regression$# when $2.s1 is null then $1
regression$# else $1 || $2.s2 || $2.s1
regression$# end$$ language sql;
CREATE FUNCTION
regression=# create aggregate concat (
regression(# basetype = twostrings,
regression(# stype = text,
regression(# sfunc = list_concat);
CREATE AGGREGATE
regression=# select * from text_tbl;
f1
-------------------
doh!
hi de ho neighbor
more stuff
and more
(4 rows)

regression=# select concat((f1, '|')) from text_tbl;
concat
--------------------------------------------
doh!|hi de ho neighbor|more stuff|and more
(1 row)


This is somewhat inefficient compared to native support for
multi-argument aggregates, but at least we have something we can point
people to until we find time to make that happen.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
tgl [ Fr, 06 August 2004 16:30 ] [ ID #453177 ]
Datenbanken » comp.databases.postgresql.sql » SQL syntax extentions - to put postgres ahead in the race

Vorheriges Thema: New PHP + PostgreSQL group on Google Groups2
Nächstes Thema: Re: contrib/tablefunc crosstab