November 20th, 2012 by depesz | Tags: , , , , , , | 9 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

And maybe not on the wall, but instead in your SQLz, eating your data.

But a bit more seriously. Ever since PostgreSQL 8.4 we have window functions, but still I see people which do not know it or are wary to use it.

That's why I decided to write a piece on window functions. How they work and what they can be used for.

Before we start even thinking about window functions, we need some test data. So I made a simple table:

create table people (
    id          serial  primary  key,
    username    text,
    department  text,
    salary      int4
);

and filled it with 250 rows based on Polish, random, names, department names from very cool movie, and some random() salaries. Looks like this:

select * from people limit 10;
 id |    username     |     department     | salary
----+-----------------+--------------------+--------
  1 | Blanka Pawlak   | Psychology         |  25200
  2 | Fabian Krawczyk | Chem               |  47900
  3 | Tomasz Górski   | Zoology            |  39600
  4 | Oliwier Woźniak | Data Archives      |  93500
  5 | Rafał Olszewski | Maintennance       |  77500
  6 | Maciej Mazur    | Distribution       |  57700
  7 | Justyna Wróbel  | Internal Logistics |  82800
  8 | Nikodem Wójcik  | Wranglers          |  90500
  9 | Lena Szewczyk   | Electrical         |  66400
 10 | Filip Kamiński  | Engineering        |  42300
(10 rows)

( full table dump, for your own testing purposes, can be downloaded. )

To write more I have to assume some things:

  • You know what grouping is
  • You know how grouping works

Window functions, are functions which work on windows. That's a surprise. And what is window? Well, for starters we can assume that windows are more or less the same as “GROUP BY" groups, but defined in such a way that each row can have it's own set of groups.

Let's see simplest possible example, and let's work from there:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
SELECT
    department,
    username,
    salary,
    count(*) over ()
FROM
    people
order by department, username
limit 20;
   department   |        username        | salary | count
----------------+------------------------+--------+-------
 Accounting     | Alan Jaworski          | 102900 |   250
 Accounting     | Aleksander Nowicki     |  78900 |   250
 Accounting     | Anna Nowakowska        |  77800 |   250
 Accounting     | Hubert Zieliński       |  27600 |   250
 Accounting     | Konrad Nowak           |  85400 |   250
 Accounting     | Mateusz Zając          |  87600 |   250
 Accounting     | Michał Olszewski       |  86100 |   250
 Accounting     | Paweł Olszewski        |  46700 |   250
 Accounting     | Rafał Dąbrowski        |  46600 |   250
 Accounting     | Rafał Kowalczyk        |  48600 |   250
 Accounting     | Zofia Szymańska        |  95200 |   250
 Administration | Antonina Zalewska      |  72700 |   250
 Administration | Dominik Kowalczyk      |  96000 |   250
 Administration | Fabian Wieczorek       |  28500 |   250
 Administration | Jan Walczak            |  27400 |   250
 Administration | Maksymilian Piotrowski |  89600 |   250
 Administration | Nadia Nowak            | 109800 |   250
 Administration | Stanisław Mazur        |  74800 |   250
 Archives       | Adam Wojciechowski     |  51200 |   250
 Archives       | Aleksandra Zając       |  31200 |   250
(20 rows)

As you can see I got count of all people in separate column. Of course – this example is not very interesting, as I could have done:

select *, (select count(*) from people) from people;

and would get more or less the same thing. That's true. Lots of stuff that I'll show with window functions can be done also differently. The point of window functions is that they are easier to write, and they tend to avoid additional table scans.

Anyway. Let's see what exactly was happening. The interesting thing is line 5:

count(*) over ()

This is basic window function syntax. There is function call (count(*) in this case), and then there is keyword “OVER" and so called “window definition" in parentheses. In this case window definition is empty, which basically means: treat whole resultset as your “virtual group".

You might notice that I used count(*), which is commonly used aggregate. The great thing is that while there specific window functions, you can also use any aggregate functions (like: min, max, avg, count, sum, array_agg, string_agg and so on) as window functions!

So the expression “count(*) over ()" means – get count of all rows. Simple. But we can do better. Let's say, we'd want to compare given person salary to average salary in her department. Query:

select
    department,
    username,
    salary,
    avg( salary ) over ( partition by department )
from
    people
order by department, username
limit 20;
   department   |        username        | salary |        avg
----------------+------------------------+--------+--------------------
 Accounting     | Alan Jaworski          | 102900 | 71218.181818181818
 Accounting     | Aleksander Nowicki     |  78900 | 71218.181818181818
 Accounting     | Anna Nowakowska        |  77800 | 71218.181818181818
 Accounting     | Hubert Zieliński       |  27600 | 71218.181818181818
 Accounting     | Konrad Nowak           |  85400 | 71218.181818181818
 Accounting     | Mateusz Zając          |  87600 | 71218.181818181818
 Accounting     | Michał Olszewski       |  86100 | 71218.181818181818
 Accounting     | Paweł Olszewski        |  46700 | 71218.181818181818
 Accounting     | Rafał Dąbrowski        |  46600 | 71218.181818181818
 Accounting     | Rafał Kowalczyk        |  48600 | 71218.181818181818
 Accounting     | Zofia Szymańska        |  95200 | 71218.181818181818
 Administration | Antonina Zalewska      |  72700 | 71257.142857142857
 Administration | Dominik Kowalczyk      |  96000 | 71257.142857142857
 Administration | Fabian Wieczorek       |  28500 | 71257.142857142857
 Administration | Jan Walczak            |  27400 | 71257.142857142857
 Administration | Maksymilian Piotrowski |  89600 | 71257.142857142857
 Administration | Nadia Nowak            | 109800 | 71257.142857142857
 Administration | Stanisław Mazur        |  74800 | 71257.142857142857
 Archives       | Adam Wojciechowski     |  51200 | 69228.571428571429
 Archives       | Aleksandra Zając       |  31200 | 69228.571428571429
(20 rows)

The window function call:

avg( salary ) over ( partition by department )

means that we want average salary for windows, where window, for each row is taken by “grouping" all rows with department being the same as department in row that is being processed.

So, to write it using group by, I could:

select
    p.department,
    p.username,
    p.salary,
    d.avg
from
    people p
    join (
        select department, avg(salary)
        from people
        group by department
    ) as d using (department)
order by p.department, p.username
limit 20;

The thing is that PostgreSQL does the “join" automatically based on values of columns in row that uses window function (not sure if that's clear).

Within window definition, you can use multiple columns (or expressions) for “PARTITION BY". For example, we could do something like this:

select
    department,
    username,
    salary,
    count(*) over (partition by department, salary < 50000 )
from
    people
order by department, username
limit 20;
   department   |        username        | salary | count
----------------+------------------------+--------+-------
 Accounting     | Alan Jaworski          | 102900 |     7
 Accounting     | Aleksander Nowicki     |  78900 |     7
 Accounting     | Anna Nowakowska        |  77800 |     7
 Accounting     | Hubert Zieliński       |  27600 |     4
 Accounting     | Konrad Nowak           |  85400 |     7
 Accounting     | Mateusz Zając          |  87600 |     7
 Accounting     | Michał Olszewski       |  86100 |     7
 Accounting     | Paweł Olszewski        |  46700 |     4
 Accounting     | Rafał Dąbrowski        |  46600 |     4
 Accounting     | Rafał Kowalczyk        |  48600 |     4
 Accounting     | Zofia Szymańska        |  95200 |     7
 Administration | Antonina Zalewska      |  72700 |     5
 Administration | Dominik Kowalczyk      |  96000 |     5
 Administration | Fabian Wieczorek       |  28500 |     2
 Administration | Jan Walczak            |  27400 |     2
 Administration | Maksymilian Piotrowski |  89600 |     5
 Administration | Nadia Nowak            | 109800 |     5
 Administration | Stanisław Mazur        |  74800 |     5
 Archives       | Adam Wojciechowski     |  51200 |    11
 Archives       | Aleksandra Zając       |  31200 |     3
(20 rows)

In here, the count is number of people, including current person, that are in the same salary group, where group is defined as < 50000, or >= 50000, but all within the same department.

The other, very important, and commonly used, part of window definition, is sorting. This is something absolutely great.

When dealing with groups, values within the group where basically tossed together in a bag, without any order.

But within windows – rows can be sorted. And the sorting can be used for interesting things:

select
    department,
    username,
    count(*) over (partition by department),
    count(*) over (partition by department order by username)
from people
order by department, username limit 20;
   department   |        username        | count | count
----------------+------------------------+-------+-------
 Accounting     | Alan Jaworski          |    11 |     1
 Accounting     | Aleksander Nowicki     |    11 |     2
 Accounting     | Anna Nowakowska        |    11 |     3
 Accounting     | Hubert Zieliński       |    11 |     4
 Accounting     | Konrad Nowak           |    11 |     5
 Accounting     | Mateusz Zając          |    11 |     6
 Accounting     | Michał Olszewski       |    11 |     7
 Accounting     | Paweł Olszewski        |    11 |     8
 Accounting     | Rafał Dąbrowski        |    11 |     9
 Accounting     | Rafał Kowalczyk        |    11 |    10
 Accounting     | Zofia Szymańska        |    11 |    11
 Administration | Antonina Zalewska      |     7 |     1
 Administration | Dominik Kowalczyk      |     7 |     2
 Administration | Fabian Wieczorek       |     7 |     3
 Administration | Jan Walczak            |     7 |     4
 Administration | Maksymilian Piotrowski |     7 |     5
 Administration | Nadia Nowak            |     7 |     6
 Administration | Stanisław Mazur        |     7 |     7
 Archives       | Adam Wojciechowski     |    14 |     1
 Archives       | Aleksandra Zając       |    14 |     2
(20 rows)

Please note that adding “order by" to window definition changes scope of window function – it doesn't work on all rows in given partition – it works just on rows from start to given row, based on ordering.

This means that doing cumulative sum gets trivial:

select
    department,
    username,
    salary,
    sum(salary) over (partition by department order by username)
from people
order by department, username
limit 20;
   department   |        username        | salary |  sum
----------------+------------------------+--------+--------
 Accounting     | Alan Jaworski          | 102900 | 102900
 Accounting     | Aleksander Nowicki     |  78900 | 181800
 Accounting     | Anna Nowakowska        |  77800 | 259600
 Accounting     | Hubert Zieliński       |  27600 | 287200
 Accounting     | Konrad Nowak           |  85400 | 372600
 Accounting     | Mateusz Zając          |  87600 | 460200
 Accounting     | Michał Olszewski       |  86100 | 546300
 Accounting     | Paweł Olszewski        |  46700 | 593000
 Accounting     | Rafał Dąbrowski        |  46600 | 639600
 Accounting     | Rafał Kowalczyk        |  48600 | 688200
 Accounting     | Zofia Szymańska        |  95200 | 783400
 Administration | Antonina Zalewska      |  72700 |  72700
 Administration | Dominik Kowalczyk      |  96000 | 168700
 Administration | Fabian Wieczorek       |  28500 | 197200
 Administration | Jan Walczak            |  27400 | 224600
 Administration | Maksymilian Piotrowski |  89600 | 314200
 Administration | Nadia Nowak            | 109800 | 424000
 Administration | Stanisław Mazur        |  74800 | 498800
 Archives       | Adam Wojciechowski     |  51200 |  51200
 Archives       | Aleksandra Zając       |  31200 |  82400
(20 rows)

As you can see – adding “ORDER BY" not only makes the rows ordered, but it also, significantly changes what the function is called on – no longer on “all" rows, but just on a subset defined by order, and location of current row.

One important exception is: row_number() – when it is being run over empty window (row_number() over ()), it just works:

select
    row_number() over (),
    *
from people
limit 20;
 row_number | id |      username       |     department     | salary
------------+----+---------------------+--------------------+--------
          1 |  1 | Blanka Pawlak       | Psychology         |  25200
          2 |  2 | Fabian Krawczyk     | Chem               |  47900
          3 |  3 | Tomasz Górski       | Zoology            |  39600
          4 |  4 | Oliwier Woźniak     | Data Archives      |  93500
          5 |  5 | Rafał Olszewski     | Maintennance       |  77500
          6 |  6 | Maciej Mazur        | Distribution       |  57700
          7 |  7 | Justyna Wróbel      | Internal Logistics |  82800
          8 |  8 | Nikodem Wójcik      | Wranglers          |  90500
          9 |  9 | Lena Szewczyk       | Electrical         |  66400
         10 | 10 | Filip Kamiński      | Engineering        |  42300
         11 | 11 | Maciej Kowalczyk    | Bio Med            | 106700
         12 | 12 | Mateusz Zając       | Accounting         |  87600
         13 | 13 | Michał Kozłowski    | R + D              |  52300
         14 | 14 | Rafał Dąbrowski     | Accounting         |  46600
         15 | 15 | Łukasz Kowalski     | Distribution       | 106200
         16 | 16 | Wojciech Kowalski   | Operations         |  66100
         17 | 17 | Paweł Olszewski     | Accounting         |  46700
         18 | 18 | Wojciech Stępień    | Communications     |  34200
         19 | 19 | Nikola Mazur        | Demolition         |  23200
         20 | 20 | Aleksander Zalewski | Communications     | 100000
(20 rows)

So. Now we know that we can either have functions that run on all rows in given partition (which can be whole result set), or a subset, based on ordering.

In all previous examples ordering within window definition was more or less the same as ordering of resulting recordset. But this doesn't have to be always true. For example, let's check this query:

select
    department,
    username,
    salary,
    row_number() over (partition by department order by salary desc)
from
    people
order by department, username
limit 20;
   department   |        username        | salary | row_number
----------------+------------------------+--------+------------
 Accounting     | Alan Jaworski          | 102900 |          1
 Accounting     | Aleksander Nowicki     |  78900 |          6
 Accounting     | Anna Nowakowska        |  77800 |          7
 Accounting     | Hubert Zieliński       |  27600 |         11
 Accounting     | Konrad Nowak           |  85400 |          5
 Accounting     | Mateusz Zając          |  87600 |          3
 Accounting     | Michał Olszewski       |  86100 |          4
 Accounting     | Paweł Olszewski        |  46700 |          9
 Accounting     | Rafał Dąbrowski        |  46600 |         10
 Accounting     | Rafał Kowalczyk        |  48600 |          8
 Accounting     | Zofia Szymańska        |  95200 |          2
 Administration | Antonina Zalewska      |  72700 |          5
 Administration | Dominik Kowalczyk      |  96000 |          2
 Administration | Fabian Wieczorek       |  28500 |          6
 Administration | Jan Walczak            |  27400 |          7
 Administration | Maksymilian Piotrowski |  89600 |          3
 Administration | Nadia Nowak            | 109800 |          1
 Administration | Stanisław Mazur        |  74800 |          4
 Archives       | Adam Wojciechowski     |  51200 |         11
 Archives       | Aleksandra Zając       |  31200 |         14
(20 rows)

One last thing that is important to know, though it is rarely used, is modification of which rows are visible to the function.

As I showed you – we basically have two modes: all rows within partition (which can be – all rows), and rows within partition that, based on some order, are before given row.

But that's not all. Window definition can also contain so called “frame definition“. If you will ever need to use it – congratulations, that doesn't happen often.

Because of this (how rarely it's needed, at least in the use cases I've seen so far), I want to show you only one specific case of frame definition. In case you're interested in more details – docs will be much better choice.

Let's assume we'd want to have average that is based on values around current row. For example – 2 rows behind to 2 rows ahead.

It is possible to do it using standard window definitions and functions like lag() and lead(), but you can also:

select
    department,
    username,
    salary,
    avg(salary) over (
        partition by department
        order by salary
        rows between 2 preceding and 2 following
    ),
    array_agg( salary ) over (
        partition by department
        order by salary
        rows between 2 preceding and 2 following
    )
from people
order by department, salary
limit 20;
   department   |        username        | salary |        avg         |            array_agg
----------------+------------------------+--------+--------------------+----------------------------------
 Accounting     | Hubert Zieliński       |  27600 | 40300.000000000000 | {27600,46600,46700}
 Accounting     | Rafał Dąbrowski        |  46600 | 42375.000000000000 | {27600,46600,46700,48600}
 Accounting     | Paweł Olszewski        |  46700 | 49460.000000000000 | {27600,46600,46700,48600,77800}
 Accounting     | Rafał Kowalczyk        |  48600 | 59720.000000000000 | {46600,46700,48600,77800,78900}
 Accounting     | Anna Nowakowska        |  77800 | 67480.000000000000 | {46700,48600,77800,78900,85400}
 Accounting     | Aleksander Nowicki     |  78900 | 75360.000000000000 | {48600,77800,78900,85400,86100}
 Accounting     | Konrad Nowak           |  85400 | 83160.000000000000 | {77800,78900,85400,86100,87600}
 Accounting     | Michał Olszewski       |  86100 | 86640.000000000000 | {78900,85400,86100,87600,95200}
 Accounting     | Mateusz Zając          |  87600 | 91440.000000000000 | {85400,86100,87600,95200,102900}
 Accounting     | Zofia Szymańska        |  95200 | 92950.000000000000 | {86100,87600,95200,102900}
 Accounting     | Alan Jaworski          | 102900 | 95233.333333333333 | {87600,95200,102900}
 Administration | Jan Walczak            |  27400 | 42866.666666666667 | {27400,28500,72700}
 Administration | Fabian Wieczorek       |  28500 | 50850.000000000000 | {27400,28500,72700,74800}
 Administration | Antonina Zalewska      |  72700 | 58600.000000000000 | {27400,28500,72700,74800,89600}
 Administration | Stanisław Mazur        |  74800 | 72320.000000000000 | {28500,72700,74800,89600,96000}
 Administration | Maksymilian Piotrowski |  89600 | 88580.000000000000 | {72700,74800,89600,96000,109800}
 Administration | Dominik Kowalczyk      |  96000 | 92550.000000000000 | {74800,89600,96000,109800}
 Administration | Nadia Nowak            | 109800 | 98466.666666666667 | {89600,96000,109800}
 Archives       | Aleksandra Zając       |  31200 | 36033.333333333333 | {31200,35300,41600}
 Archives       | Paweł Stępień          |  35300 | 39825.000000000000 | {31200,35300,41600,51200}
(20 rows)

Last column shows what values are used for the avg() call in avg column. As you can see it can either have 3, 4, or (usually) 5 rows to work with.

This is because in some cases (at the beginning or end) there are no “preceding" or “following" rows within given partition.

When it comes to syntax, there is one last bit, very simple, to show. In case you're having queries with multiple function calls over the same windows (just like above), you can move window definition to separate location, name it, and use the name instead of whole definition. Like this:

select
    department,
    username,
    salary,
    avg(salary) over my_test_window,
    array_agg( salary ) over my_test_window
from people
window my_test_window as (
        partition by department
        order by salary
        rows between 2 preceding and 2 following
    )
order by department, salary
limit 20;

That's about it when it comes to general syntax. What about real-life situation, and what can be a surprise?

Let's see:

select
    department,
    username,
    avg( salary ) / 1000 over (partition by department)
from
    people
order by department, username
limit 20;

Do you see error in above example?

If you'd run it, you would get:

ERROR:  syntax error at or near "over"
LINE 4:     avg( salary ) / 1000 over (partition by department)
                                 ^

Do you know/see/understand why?

What exactly is divided by 1000? Remember that the whole expression is: function() over (window_definition). So if you want to divide it by 1000, you have to put division at the end of expression, not in the middle. So, the proper query would have:

select
    department,
    username,
    avg( salary ) over (partition by department) / 1000
from
    people
order by department, username
limit 20;

Sometimes, people on irc or on mailing lists ask why they can't use window functions in where. Like:

select
    *
FROM
    people
where
    row_number() over (partition by department order by salary desc) < 6;

To get the top-5 most earning people per department. Reason is very simple – window functions are calculated after where, group by and having clauses (but before order by/limit). So if you want such query, you have to use subselect:

select
    *
from
    (
        select
            *,
            row_number() over (partition by department order by salary desc)
        from
            people
    ) as x
where row_number < 6
order by department, salary desc;

All this means that any developer who is using SQL and database engine that supports window functions – should at the very least play a bit with them to see what is possible, and how.

In my previous posts you can find some examples on how to use them, and what they can be used for, and if anything is not clear – please say so in comments, I'll try to update the post to cover all basic and intermediate corners.

  1. 9 comments

  2. # Rudi
    Nov 20, 2012

    In my experience window functions become problematic as soon as the are used in the context of views. The planner does not seem to be able to figure out the optimum query scheme and as a result extreme long running queries occur

  3. # pg_lizard
    Nov 20, 2012

    Can never get enough of Windowing queries. Great post again Depesz. I am a heavy user of these and yet learned a few more tricks by reading your article.

  4. Nov 21, 2012

    Thank you for pointing out the `FROM tablename WINDOW window_name AS ()` syntax. I’d missed that in my reading of the documentation and it would’ve been handy to tidy up a couple of queries I’ve done recently.

    Window functions have quickly been moving into my “how did I ever do without them” tool-kit. I suspect 9.3’s LATERAL will soon follow.

  5. Nov 21, 2012

    It’s quite unusual that frames are not more known. I know no better way of writing, for example, cumulative sums (which are used so often in analysis, financial software…). Here’s a good example:

    http://stackoverflow.com/questions/11268398/sum-every-3-rows-of-a-table/13140979#13140979

    They know window functions well (lag, lead and what not), but in my oppinion it’s the wrong tool for the job. Using frames in this case is a solution to that whole class of problems.

    Best regards from me. I follow your blog religiously.

  6. Nov 21, 2012

    @Milos:
    I think the correct term is rolling average. as cumultive sum is simply sum(x) over (order by …).

    Sure – in such case frame clauses are the best solution.

  7. # Johnf
    Nov 25, 2012

    Is the avg for Archives correct?

  8. Nov 26, 2012

    @johnf:
    yes, it is – if you think it’s wrong because shown salaries don’t give this average – it’s because only 2 rows from archives are shown, but as you can see in next query, total number of rows for archives is higher, but “limit 20″ is hiding them.

    Please note – you can download the data that I ran those queries on, and check for yourself.

  9. # Anonymous
    Dec 14, 2012

    Great article I learned a lot. A couple of hints for improving it would to offer more explanation of the the window frame that the count(*) applies too in the example where you introduce the ORDER BY caluse.

  10. Dec 14, 2012

    @Anonymous:
    I thought that this example:

    select
    department,
    username,
    count(*) over (partition by department),
    count(*) over (partition by department order by username)
    from people
    order by department, username limit 20;

    showed the difference good enough. I mean – I can’t figure out what I could say, that this example doesn’t show.

Leave a comment