Waiting for 9.4 – Implement the FILTER clause for aggregate function calls.

On 17th of July, Noah Misch committed patch:

Implement the FILTER clause for aggregate function calls.
 
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.
 
catversion bump due to change in Aggref and WindowFunc.
 
David Fetter, reviewed by Dean Rasheed.

For very long developers had to use weird tricks to get partial sums/counts.

Something like:

SELECT SUM( CASE WHEN ... THEN 1 ELSE 0 END ) FROM ..

This did work, but was hard to read, not obvious when maintaining, and some things were much more complicated than they should be.

Luckily now, we got FILTER.

What it does? It lets you use normal GROUP BY, but make aggregate calls (avg(), min(), max(), sum(), and so on) to work only on some of the rows in the group.

For example – I loaded Pagila database, and in there, I can:

$ SELECT substr(last_name, 1,1), COUNT(*) AS ALL FROM actor GROUP BY 1 ORDER BY 1;
 substr | ALL
--------+-----
 A      |   7
 B      |  22
 C      |  15
 D      |  21
 F      |   2
 G      |  12
 H      |  19
 J      |   7
 K      |   8
 L      |   2
 M      |  15
 N      |   7
 O      |   2
 P      |  14
 R      |   2
 S      |   9
 T      |  13
 V      |   1
 W      |  19
 Z      |   3
(20 ROWS)

Now, let's assume I would want to check how many people, in each group have first name that starts with the same letter as last name.

With previous Pgs, I would do some sum(case), or perhaps outer join with subquery. Now, I can:

$ SELECT substr(last_name, 1,1),
       COUNT(*) AS ALL,
       COUNT(*) FILTER ( WHERE substr(first_name, 1,1) = substr(last_name, 1,1))
FROM actor
GROUP BY 1
ORDER BY 1;
 substr | ALL | COUNT
--------+-----+-------
 A      |   7 |     1
 B      |  22 |     0
 C      |  15 |     0
 D      |  21 |     0
 F      |   2 |     0
 G      |  12 |     1
 H      |  19 |     1
 J      |   7 |     1
 K      |   8 |     0
 L      |   2 |     0
 M      |  15 |     2
 N      |   7 |     0
 O      |   2 |     0
 P      |  14 |     1
 R      |   2 |     1
 S      |   9 |     1
 T      |  13 |     1
 V      |   1 |     0
 W      |  19 |     1
 Z      |   3 |     0
(20 ROWS)

The great thing is that you can have multiple aggregates, each with different filter, like:

$ SELECT substr(last_name, 1,1),
    COUNT(*) AS ALL,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) < 4 ) AS short_first,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) = 4 ) AS first4,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) = 5 ) AS first5,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) = 6 ) AS first6,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) = 7 ) AS first7,
    COUNT(*) FILTER ( WHERE LENGTH(first_name) > 7 ) AS long_first
FROM actor
GROUP BY 1
ORDER BY 1;
 substr | ALL | short_first | first4 | first5 | first6 | first7 | long_first·
--------+-----+-------------+--------+--------+--------+--------+------------
 A      |   7 |           1 |      1 |      1 |      1 |      1 |          2
 B      |  22 |           1 |      3 |      6 |      4 |      4 |          4
 C      |  15 |           3 |      5 |      2 |      2 |      2 |          1
 D      |  21 |           0 |      6 |      6 |      0 |      4 |          5
 F      |   2 |           1 |      0 |      1 |      0 |      0 |          0
 G      |  12 |           2 |      3 |      2 |      1 |      1 |          3
 H      |  19 |           5 |      4 |      1 |      5 |      3 |          1
 J      |   7 |           1 |      2 |      1 |      2 |      1 |          0
 K      |   8 |           1 |      1 |      4 |      2 |      0 |          0
 L      |   2 |           0 |      0 |      0 |      1 |      1 |          0
 M      |  15 |           4 |      4 |      4 |      1 |      0 |          2
 N      |   7 |           0 |      0 |      4 |      2 |      0 |          1
 O      |   2 |           0 |      1 |      0 |      1 |      0 |          0
 P      |  14 |           0 |      3 |      3 |      1 |      6 |          1
 R      |   2 |           0 |      2 |      0 |      0 |      0 |          0
 S      |   9 |           2 |      3 |      2 |      0 |      2 |          0
 T      |  13 |           2 |      3 |      2 |      2 |      4 |          0
 V      |   1 |           0 |      0 |      1 |      0 |      0 |          0
 W      |  19 |           5 |      6 |      2 |      2 |      2 |          2
 Z      |   3 |           0 |      0 |      1 |      1 |      1 |          0
(20 ROWS)

Of course your filters can be much more complicated. Great stuff, thanks David and Dean.

3 thoughts on “Waiting for 9.4 – Implement the FILTER clause for aggregate function calls.”

  1. If you’re still not convinced of the power of this feature, consider aggregates where can’t use a “no-op” row to ignore values you’re not interested in:

    SELECT f1, max, array_agg(date) FROM (SELECT f1, max(f2), unnest(array_agg(CASE WHEN date >= current_date THEN date END)) AS date FROM tbl GROUP BY f1) AS j WHERE date IS NOT NULL GROUP BY f1, max;

    versus

    SELECT f1, max(f2), array_agg(date) FILTER (WHERE date >= current_date) FROM tbl GROUP BY f1;

  2. How this affects the performance. I am using agreegates with case but this tends to keep very difficult when we have to deal with many columns (like reporting).
    Is it will also help to increase performance and how?

    best of wishes for it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.