January 6th, 2010 by depesz | Tags: , , , , , | 3 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 15th of December Tom Lane committed patch by Andrew Gierth (aka RhodiumToad), which adds interesting capability:

Log Message:
-----------
Support ORDER BY within aggregate function calls, at long last providing a
non-kluge method for controlling the order in which values are fed to an
aggregate function. At the same time eliminate the old implementation
restriction that DISTINCT was only supported for single-argument aggregates.
 
Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
dropped null values of x unconditionally. Now, it does so only if the
agg transition function is strict; otherwise nulls are treated as DISTINCT
normally would, ie, you get one copy.
 
Andrew Gierth, reviewed by Hitoshi Harada

Description in log message is pretty clear, so let's just test it.

Let's assume we have orders table with following structure:

# \d orders
Table "public.orders"
Column | Type | Modifiers
---------+---------+-----------------------------------------------------
id | integer | not null default nextval('orders_id_seq'::regclass)
buyer | text | not null
ordered | date | not null
total | integer |
Indexes:
"orders_pkey" PRIMARY KEY, btree (id)

Within these table we have these data:

# select * from orders;
id | buyer | ordered | total
----+-------+------------+-------
1 | Alice | 2009-05-12 | 57
2 | Eve | 2009-07-07 | 131
3 | Alice | 2009-08-15 | 177
4 | Carol | 2009-04-28 | 232
...
38 | Eve | 2009-02-01 | 225
(38 rows)

Data was generated using this query:

insert into orders (buyer, ordered, total)
select n[j], now() - '1 year'::interval * random(), 50 + random() * 200
from
generate_series(1,10) i,
generate_series(1,5) j,
(select '{Alice,Bob,Carol,Dave,Eve}'::text[] as n) x
where random() < 0.6;

Now. We'd like to get, for each buyer, sum of order values, and list of dates when they ordered. This is simple:

# select buyer, sum(total), array_agg( ordered)
from orders
group by buyer
order by buyer;
buyer | sum | array_agg
-------+------+------------------------------------------------------------------------------------------------------
Alice | 1057 | {2009-05-12,2009-08-15,2009-11-22,2009-05-08,2009-02-14,2009-03-25,2009-03-06,2009-08-16}
Bob | 905 | {2009-01-29,2009-08-22,2009-05-28,2009-05-12,2009-02-10,2009-08-17}
Carol | 1118 | {2009-04-28,2009-03-12,2009-01-10,2009-03-30,2009-06-27,2009-09-19,2009-12-14,2009-09-06,2009-09-01}
Dave | 1239 | {2009-07-15,2009-07-27,2009-02-07,2009-12-19,2009-01-13,2009-05-28,2009-08-27}
Eve | 1222 | {2009-07-07,2009-04-07,2009-10-02,2009-02-05,2009-09-26,2009-04-18,2009-08-14,2009-02-01}
(5 rows)

The problem in here is that dates in array_agg are not ordered. Previously, we had to do stuff like this:

# select buyer, sum(total), array_agg( ordered)
from ( select * from orders order by buyer, ordered ) x
group by buyer
order by buyer;
buyer | sum | array_agg
-------+------+------------------------------------------------------------------------------------------------------
Alice | 1057 | {2009-02-14,2009-03-06,2009-03-25,2009-05-08,2009-05-12,2009-08-15,2009-08-16,2009-11-22}
Bob | 905 | {2009-01-29,2009-02-10,2009-05-12,2009-05-28,2009-08-17,2009-08-22}
Carol | 1118 | {2009-01-10,2009-03-12,2009-03-30,2009-04-28,2009-06-27,2009-09-01,2009-09-06,2009-09-19,2009-12-14}
Dave | 1239 | {2009-01-13,2009-02-07,2009-05-28,2009-07-15,2009-07-27,2009-08-27,2009-12-19}
Eve | 1222 | {2009-02-01,2009-02-05,2009-04-07,2009-04-18,2009-07-07,2009-08-14,2009-09-26,2009-10-02}
(5 rows)

Which works, but the problem is that the order is not guaranteed – it is ordered because that's how it works with this plan. Should PostgreSQL choose another plan to run the query – it might become unordered again.

Now, thanks to Andrew, we can force the ordering:

# select buyer, sum(total), array_agg( ordered order by ordered )
from orders
group by buyer;
buyer | sum | array_agg
-------+------+------------------------------------------------------------------------------------------------------
Alice | 1057 | {2009-02-14,2009-03-06,2009-03-25,2009-05-08,2009-05-12,2009-08-15,2009-08-16,2009-11-22}
Bob | 905 | {2009-01-29,2009-02-10,2009-05-12,2009-05-28,2009-08-17,2009-08-22}
Carol | 1118 | {2009-01-10,2009-03-12,2009-03-30,2009-04-28,2009-06-27,2009-09-01,2009-09-06,2009-09-19,2009-12-14}
Dave | 1239 | {2009-01-13,2009-02-07,2009-05-28,2009-07-15,2009-07-27,2009-08-27,2009-12-19}
Eve | 1222 | {2009-02-01,2009-02-05,2009-04-07,2009-04-18,2009-07-07,2009-08-14,2009-09-26,2009-10-02}
(5 rows)

What's more – we can sort by value in some other field – for example – sort orders by their value:

# select buyer, sum(total), array_agg( ordered order by total desc )
from orders
group by buyer;
buyer | sum | array_agg
-------+------+------------------------------------------------------------------------------------------------------
Alice | 1057 | {2009-05-08,2009-08-15,2009-03-25,2009-08-16,2009-02-14,2009-11-22,2009-03-06,2009-05-12}
Bob | 905 | {2009-02-10,2009-01-29,2009-08-17,2009-05-12,2009-08-22,2009-05-28}
Carol | 1118 | {2009-04-28,2009-09-01,2009-03-30,2009-06-27,2009-12-14,2009-09-19,2009-03-12,2009-01-10,2009-09-06}
Dave | 1239 | {2009-05-28,2009-07-27,2009-02-07,2009-07-15,2009-08-27,2009-01-13,2009-12-19}
Eve | 1222 | {2009-02-01,2009-08-14,2009-09-26,2009-04-07,2009-07-07,2009-02-05,2009-04-18,2009-10-02}
(5 rows)

So, we can check:

# select *
from orders
where buyer = 'Bob'
order by total desc;
id | buyer | ordered | total
----+-------+------------+-------
24 | Bob | 2009-02-10 | 249
11 | Bob | 2009-01-29 | 215
26 | Bob | 2009-08-17 | 156
19 | Bob | 2009-05-12 | 105
12 | Bob | 2009-08-22 | 95
17 | Bob | 2009-05-28 | 85
(6 rows)

Great. Works as described :) Of course ordering doesn't make sense for all aggregates – count(*) for example :) But the sole fact that we have it, means we can calculate median in plain SQL:

# WITH aggregated AS (
select
buyer,
array_agg(total order by total) as list,
count(*),
count(*) / 2 as half
from orders
group by buyer
)
select
buyer,
list,
case
when count % 2 = 1 then list[ half + 1 ]
else ( list[ half ] + list[ half + 1 ] ) / 2
end as median
FROM
aggregated;
buyer | list | median
-------+-----------------------------------+--------
Alice | {57,70,83,119,156,170,177,225} | 137
Bob | {85,95,105,156,215,249} | 130
Carol | {52,77,86,99,105,107,159,201,232} | 105
Dave | {98,131,154,155,219,241,241} | 155
Eve | {66,85,121,131,182,202,210,225} | 156
(5 rows)

How sweet it that?

  1. 3 comments

  2. # Marti
    Jan 6, 2010

    Note that one psql block in your post is not monospaced:
    select buyer, sum(total), array_agg( ordered order by total desc ) from orders group by buyer …

  3. Jan 6, 2010

    @Marti: thanks, fixed.

  4. Mar 12, 2010

    Great! Then ordered aggregates is another feature of Muldis D that will now be implementable over Postgres without much circumlocution. The PostgreSQL developers continue to make my job of implementing Muldis D easier with every release.

Leave a comment