Waiting for 8.4 – window functions – select * from depesz;

A long overdue post about new functionality. At this moment it is no longer such new, as it was committed on 28th of December (yes, I know, I should have written it earlier, Sorry).

On this day Tom Lane committed patch by Hitoshi Harada which adds support for so called window functions:

Support window functions a la SQL:2008.
 
Hitoshi Harada, with some kibitzing from Heikki and Tom.

Disclaimer: I find naming in this topic to be “a bit" confusing. I'm trying to explain it as I understand, but I might be wrong, and if I am, please comment, so others will benefit from clarifications.

What “window functions" are? What they do?

Well, window functions are function that can work on some subset of returned rows to gather some general statistics.

So, they “kind of" work like aggregates, but aggregates force you to return only 1 row per group, while in window function you can return many.

Simplest possible example:

This is aggregate:

# select avg(i) from generate_series(1,3) i;
        avg
--------------------
 2.0000000000000000
(1 row)

There is only 1 group (all rows from generate_series(1,3), and it returns only 1 row).

But with window functions we can do something like this:

# select i, avg(i) over () from generate_series(1,3) i;
 i |        avg
---+--------------------
 1 | 2.0000000000000000
 2 | 2.0000000000000000
 3 | 2.0000000000000000
(3 rows)

This is not something you could do with aggregates otherwise then using some kind of subselect.

Before I'll explain how my above query worked, let me introduce some basic concepts of window functions: partitions and windows.

Partition is generally the same as “group by" in standard aggregates. It generates groups (partitions), and all window functions calls are ran in “context" of given partition.

For example. Let's assume we have pretty simple table with people, their department and salary:

create table test (
    person text,
    dept text,
    salary int4
);

Now, let's put there some people:

insert into test (person, dept, salary) values
( 'alexander', 'it',         3700 ),
( 'benjamin',  'accounting', 2100 ),
( 'bradley',   'marketing',  1000 ),
( 'chandler',  'management', 2400 ),
( 'fernando',  'marketing',  1650 ),
( 'joey',      'management', 3250 ),
( 'lincoln',   'accounting', 2950 ),
( 'lj',        'it',         3250 ),
( 'michael',   'accounting', 1700 ),
( 'monica',    'management', 2950 ),
( 'paul',      'accounting', 1650 ),
( 'phoebe',    'accounting', 3700 ),
( 'rachel',    'accounting', 2950 ),
( 'ross',      'it',         1250 ),
( 'sara',      'it',         2600 ),
( 'theodore',  'management', 3250 )
;

So, let's use window functions to show what is the salary of any given person against average salary in his department.

To do so, we will partition the data by department, and calculate averages in these partitions:

# select person, dept, salary, avg(salary) over( partition by dept ) from test;
  person   |    dept    | salary |          avg
-----------+------------+--------+-----------------------
 lincoln   | accounting |   2950 | 2508.3333333333333333
 benjamin  | accounting |   2100 | 2508.3333333333333333
 rachel    | accounting |   2950 | 2508.3333333333333333
 phoebe    | accounting |   3700 | 2508.3333333333333333
 paul      | accounting |   1650 | 2508.3333333333333333
 michael   | accounting |   1700 | 2508.3333333333333333
 alexander | it         |   3700 | 2700.0000000000000000
 lj        | it         |   3250 | 2700.0000000000000000
 ross      | it         |   1250 | 2700.0000000000000000
 sara      | it         |   2600 | 2700.0000000000000000
 chandler  | management |   2400 | 2962.5000000000000000
 theodore  | management |   3250 | 2962.5000000000000000
 joey      | management |   3250 | 2962.5000000000000000
 monica    | management |   2950 | 2962.5000000000000000
 fernando  | marketing  |   1650 | 1325.0000000000000000
 bradley   | marketing  |   1000 | 1325.0000000000000000
(16 rows)

As you can see it automatically sorted results by dept – this is side effect, and don't count on it.

Anyway – we can add our own sorting to this:

# select person, dept, salary, avg(salary) over( partition by dept ) from test order by person;
  person   |    dept    | salary |          avg
-----------+------------+--------+-----------------------
 alexander | it         |   3700 | 2700.0000000000000000
 benjamin  | accounting |   2100 | 2508.3333333333333333
 bradley   | marketing  |   1000 | 1325.0000000000000000
 chandler  | management |   2400 | 2962.5000000000000000
 fernando  | marketing  |   1650 | 1325.0000000000000000
 joey      | management |   3250 | 2962.5000000000000000
 lincoln   | accounting |   2950 | 2508.3333333333333333
 lj        | it         |   3250 | 2700.0000000000000000
 michael   | accounting |   1700 | 2508.3333333333333333
 monica    | management |   2950 | 2962.5000000000000000
 paul      | accounting |   1650 | 2508.3333333333333333
 phoebe    | accounting |   3700 | 2508.3333333333333333
 rachel    | accounting |   2950 | 2508.3333333333333333
 ross      | it         |   1250 | 2700.0000000000000000
 sara      | it         |   2600 | 2700.0000000000000000
 theodore  | management |   3250 | 2962.5000000000000000
(16 rows)

Pretty nifty, isn't it?

Especially, since you can:

# select
    person,
    dept,
    cast(
        salary * 100 / avg(salary) over( partition by dept )
        as numeric(5,2)
    ) as percent_of_average
from test
order by percent_of_average desc;
  person   |    dept    | percent_of_average
-----------+------------+--------------------
 phoebe    | accounting |             147.51
 alexander | it         |             137.04
 fernando  | marketing  |             124.53
 lj        | it         |             120.37
 rachel    | accounting |             117.61
 lincoln   | accounting |             117.61
 theodore  | management |             109.70
 joey      | management |             109.70
 monica    | management |              99.58
 sara      | it         |              96.30
 benjamin  | accounting |              83.72
 chandler  | management |              81.01
 bradley   | marketing  |              75.47
 michael   | accounting |              67.77
 paul      | accounting |              65.78
 ross      | it         |              46.30
(16 rows)

To quickly check who (in given department) is the most overpaid. Or looks like the most overpaid 🙂

So, I hope you understand what partition is. Now to the windows.

As you saw, all window functions that I called till now, worked on all rows in given partition.

It means that given “average" salary was average to whole department. Windows let you say that you want to calculate something for only part of the partition.

The 2 most common examples of such calculations are “rownum" and cumulative sum. I wrote about them long time ago, but with 8.4 you can write it much simpler:

rownum using window functions:

# select person, dept, salary, rank() over ( partition by dept order by salary ), row_number() over (partition by dept order by salary) from test;
  person   |    dept    | salary | rank | row_number
-----------+------------+--------+------+------------
 paul      | accounting |   1650 |    1 |          1
 michael   | accounting |   1700 |    2 |          2
 benjamin  | accounting |   2100 |    3 |          3
 lincoln   | accounting |   2950 |    4 |          4
 rachel    | accounting |   2950 |    4 |          5
 phoebe    | accounting |   3700 |    6 |          6
 ross      | it         |   1250 |    1 |          1
 sara      | it         |   2600 |    2 |          2
 lj        | it         |   3250 |    3 |          3
 alexander | it         |   3700 |    4 |          4
 chandler  | management |   2400 |    1 |          1
 monica    | management |   2950 |    2 |          2
 joey      | management |   3250 |    3 |          3
 theodore  | management |   3250 |    3 |          4
 bradley   | marketing  |   1000 |    1 |          1
 fernando  | marketing  |   1650 |    2 |          2
(16 rows)

Again – it just so happens that rows are ordered, but don't count on it! If you want them ordered, add proper order by clause.

rank() is kind-of like rownum. It returns which position given row gets when ordering by (whatever we put in “order by" clause inside “over ()".

Please notice, that we don't have to partition the data:

# select person, dept, salary, rank() over ( order by salary ) from test;
  person   |    dept    | salary | rank
-----------+------------+--------+------
 bradley   | marketing  |   1000 |    1
 ross      | it         |   1250 |    2
 paul      | accounting |   1650 |    3
 fernando  | marketing  |   1650 |    3
 michael   | accounting |   1700 |    5
 benjamin  | accounting |   2100 |    6
 chandler  | management |   2400 |    7
 sara      | it         |   2600 |    8
 lincoln   | accounting |   2950 |    9
 monica    | management |   2950 |    9
 rachel    | accounting |   2950 |    9
 theodore  | management |   3250 |   12
 joey      | management |   3250 |   12
 lj        | it         |   3250 |   12
 phoebe    | accounting |   3700 |   15
 alexander | it         |   3700 |   15
(16 rows)

In this situation there is only 1 partition – i.e. whole resultset.

This example also shows why rank() is not really rownum – it's not unique (it was also visible in previous example, but I think in this one it's clearer).

There are 2 rows with rank = 3, 3 rows with 9 or 12 and so on.

The reason is very simple – given the ordering we shows (i.e. only salary) these rows are indistinguishable. But if I'd change the order by (within over()) to make them different:

# select person, dept, salary, rank() over ( order by salary, person ) from test;
  person   |    dept    | salary | rank
-----------+------------+--------+------
 bradley   | marketing  |   1000 |    1
 ross      | it         |   1250 |    2
 fernando  | marketing  |   1650 |    3
 paul      | accounting |   1650 |    4
 michael   | accounting |   1700 |    5
 benjamin  | accounting |   2100 |    6
 chandler  | management |   2400 |    7
 sara      | it         |   2600 |    8
 lincoln   | accounting |   2950 |    9
 monica    | management |   2950 |   10
 rachel    | accounting |   2950 |   11
 joey      | management |   3250 |   12
 lj        | it         |   3250 |   13
 theodore  | management |   3250 |   14
 alexander | it         |   3700 |   15
 phoebe    | accounting |   3700 |   16
(16 rows)

rank() is simple, but what about cumulative sum? Let's try:

# select person, dept, salary, sum(salary) over ( partition by dept order by person ) from test;
  person   |    dept    | salary |  sum
-----------+------------+--------+-------
 benjamin  | accounting |   2100 |  2100
 lincoln   | accounting |   2950 |  5050
 michael   | accounting |   1700 |  6750
 paul      | accounting |   1650 |  8400
 phoebe    | accounting |   3700 | 12100
 rachel    | accounting |   2950 | 15050
 alexander | it         |   3700 |  3700
 lj        | it         |   3250 |  6950
 ross      | it         |   1250 |  8200
 sara      | it         |   2600 | 10800
 chandler  | management |   2400 |  2400
 joey      | management |   3250 |  5650
 monica    | management |   2950 |  8600
 theodore  | management |   3250 | 11850
 bradley   | marketing  |   1000 |  1000
 fernando  | marketing  |   1650 |  2650
(16 rows)

Generally – it runs sum(salary) on every row and previous rows (in given partition).

Now, since it's quite sensible to use the same “OVER" multiple times in given query (to get average and sum of salaries for example) you can define them separately, like this:

select
    dept,
    rank() over ( w_depts order by person ),
    person,
    salary,
    sum(salary) over ( w_depts ),
    avg(salary) over (w_depts),
    count(*) over ( w_depts )
from
    test
window
    w_depts as ( partition by dept )
order by
    dept, person;
    dept    | rank |  person   | salary |  sum  |          avg          | count
------------+------+-----------+--------+-------+-----------------------+-------
 accounting |    1 | benjamin  |   2100 | 15050 | 2508.3333333333333333 |     6
 accounting |    2 | lincoln   |   2950 | 15050 | 2508.3333333333333333 |     6
 accounting |    3 | michael   |   1700 | 15050 | 2508.3333333333333333 |     6
 accounting |    4 | paul      |   1650 | 15050 | 2508.3333333333333333 |     6
 accounting |    5 | phoebe    |   3700 | 15050 | 2508.3333333333333333 |     6
 accounting |    6 | rachel    |   2950 | 15050 | 2508.3333333333333333 |     6
 it         |    1 | alexander |   3700 | 10800 | 2700.0000000000000000 |     4
 it         |    2 | lj        |   3250 | 10800 | 2700.0000000000000000 |     4
 it         |    3 | ross      |   1250 | 10800 | 2700.0000000000000000 |     4
 it         |    4 | sara      |   2600 | 10800 | 2700.0000000000000000 |     4
 management |    1 | chandler  |   2400 | 11850 | 2962.5000000000000000 |     4
 management |    2 | joey      |   3250 | 11850 | 2962.5000000000000000 |     4
 management |    3 | monica    |   2950 | 11850 | 2962.5000000000000000 |     4
 management |    4 | theodore  |   3250 | 11850 | 2962.5000000000000000 |     4
 marketing  |    1 | bradley   |   1000 |  2650 | 1325.0000000000000000 |     2
 marketing  |    2 | fernando  |   1650 |  2650 | 1325.0000000000000000 |     2
(16 rows)

You can also use window functions to show you parts of partitions. For example, let's assume you want list of people who earn the most, or second to most in their departments.

Doing this without window functions would be really tedious, but window functions, give as simple way:

# select dept, person, salary from (
    select dept, person, salary, rank() over (partition by dept order by salary desc) from test
) as x where rank in (1,2);
    dept    |  person   | salary
------------+-----------+--------
 accounting | phoebe    |   3700
 accounting | lincoln   |   2950
 accounting | rachel    |   2950
 it         | alexander |   3700
 it         | lj        |   3250
 management | theodore  |   3250
 management | joey      |   3250
 marketing  | fernando  |   1650
 marketing  | bradley   |   1000
(9 rows)

You might ask yourself if it is not possible to do without subselect – unfortunately – window functions are evaluated after result was generated, i.e. after all conditions have been passed. But the subselect shouldn't be big issue.

Now, after all these great things, there is one thing that bugs me. Citation from manual:

For each row, the window function is computed across the rows that fall into the same partition as the current row

This rings some bad bells for me. Let's write a simple aggregate that will generally do count(), but will also raise notice every time it will get called:

CREATE OR REPLACE FUNCTION test_count(a INT4, b INT4) RETURNs INT4 as $$
BEGIN
    raise notice 'test_count(%, %)', a, b;
    RETURN a + 1;
END;
$$ language plpgsql;
CREATE aggregate test_count (INT4) (
    sfunc = test_count,
    stype = INT4,
    initcond = '0'
);

It's pretty simple, and we can easily validate how it works:

# select test_count(1), count(*) from test;
NOTICE:  test_count(0, 1)
NOTICE:  test_count(1, 1)
NOTICE:  test_count(2, 1)
NOTICE:  test_count(3, 1)
NOTICE:  test_count(4, 1)
NOTICE:  test_count(5, 1)
NOTICE:  test_count(6, 1)
NOTICE:  test_count(7, 1)
NOTICE:  test_count(8, 1)
NOTICE:  test_count(9, 1)
NOTICE:  test_count(10, 1)
NOTICE:  test_count(11, 1)
NOTICE:  test_count(12, 1)
NOTICE:  test_count(13, 1)
NOTICE:  test_count(14, 1)
NOTICE:  test_count(15, 1)
 test_count | count
------------+-------
         16 |    16
(1 row)

As you can see the function has been called 16 times – one for every row in data.

But let's see how it will go when calling it as window function:

# select person, test_count(1) over (order by person) from test where dept = 'it';
NOTICE:  test_count(0, 1)
NOTICE:  test_count(1, 1)
NOTICE:  test_count(2, 1)
NOTICE:  test_count(3, 1)
  person   | test_count
-----------+------------
 alexander |          1
 lj        |          2
 ross      |          3
 sara      |          4
(4 rows)

Great. I was afraid that it will be:

NOTICE:  test_count(0, 1)
NOTICE:  test_count(0, 1)
NOTICE:  test_count(1, 1)
NOTICE:  test_count(0, 1)
NOTICE:  test_count(1, 1)
NOTICE:  test_count(2, 1)
NOTICE:  test_count(0, 1)
NOTICE:  test_count(1, 1)
NOTICE:  test_count(2, 1)
NOTICE:  test_count(3, 1)

i.e. – one time for every row in every window it has to take into consideration.

But luckily it's smarter, and realizes that the value can be reused.

So, to finish this long post – another great feature of PostgreSQL – have fun playing with it.

2 thoughts on “Waiting for 8.4 – window functions”

depesz says:

2009-01-21 at 19:34

breinbaas on irc suggested using row_number in my example. I checked it and yes – it looks great. I added it to one example, but then I realized that adding it, and commenting everywhere would take too much time.

To put it simply – while writing the post I didn’t know row_number() exists. In http://developer.postgresql.org/pgdocs/postgres/tutorial-window.html it never links to http://developer.postgresql.org/pgdocs/postgres/functions-window.html and I somehow missed it.

I’m sorry – will check the docs better next time.
Hitoshi Harada says:

2009-01-22 at 03:09

Thanks for mentioning it. Oh, I shoud’ve had to add to tutorial row_number(), which is the most basic window function :P).
I believe this is a great feature and will help many poeple to solve complex problems such as above easily.
Enjoy!

Comments are closed.