February 17th, 2010 by depesz | Tags: , , , , | No comments »
Did it help? If yes - maybe you can help me?

On 12th of February Tom Lane committed patch by Hitoshi Harada:

Log Message:
-----------
Extend the set of frame options supported for window functions.
 
This patch allows the frame to start from CURRENT ROW (in either RANGE or
ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING
start and end points.  (RANGE value PRECEDING/FOLLOWING isn't there yet ---
the grammar works, but that's all.)
 
Hitoshi Harada, reviewed by Pavel Stehule

The description is pretty clear, but let's look at what exactly it is.

First, let's create simple test table:

create table test (
    id serial primary key,
    some_grouping int4,
    some_value int4
);

Now, let's insert some random data:

insert into test (some_grouping, some_value)
    select i, random() * 100 + 50 as j
    from generate_series(1,2) i, generate_series(1,10) k;

The data in it doesn't look spectacular, but it serves the purpose:

select * from test order by id asc;
 id | some_grouping | some_value
----+---------------+------------
  1 |             1 |         94
  2 |             1 |         86
  3 |             1 |         77
  4 |             1 |         86
  5 |             1 |        146
  6 |             1 |        145
  7 |             1 |         68
  8 |             1 |         71
  9 |             1 |         84
 10 |             1 |        105
 11 |             2 |        126
 12 |             2 |         88
 13 |             2 |         90
 14 |             2 |         50
 15 |             2 |        104
 16 |             2 |         68
 17 |             2 |         61
 18 |             2 |        107
 19 |             2 |        127
 20 |             2 |        112
(20 rows)

Now. If you're not familiar with window functions, and specyfically with framing, let me show you simple example:

SELECT
    id,
    some_grouping,
    some_value,
    sum(some_value) OVER (
        PARTITION BY some_grouping
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as sum_per_group,
    sum(some_value) OVER (
        PARTITION BY some_grouping
        ORDER BY id
        RANGE UNBOUNDED PRECEDING
    ) as cumulative_sum_per_group
FROM
    test
ORDER BY some_grouping, id;
 id | some_grouping | some_value | sum_per_group | cumulative_sum_per_group
----+---------------+------------+---------------+--------------------------
  1 |             1 |         94 |           962 |                       94
  2 |             1 |         86 |           962 |                      180
  3 |             1 |         77 |           962 |                      257
  4 |             1 |         86 |           962 |                      343
  5 |             1 |        146 |           962 |                      489
  6 |             1 |        145 |           962 |                      634
  7 |             1 |         68 |           962 |                      702
  8 |             1 |         71 |           962 |                      773
  9 |             1 |         84 |           962 |                      857
 10 |             1 |        105 |           962 |                      962
 11 |             2 |        126 |           933 |                      126
 12 |             2 |         88 |           933 |                      214
 13 |             2 |         90 |           933 |                      304
 14 |             2 |         50 |           933 |                      354
 15 |             2 |        104 |           933 |                      458
 16 |             2 |         68 |           933 |                      526
 17 |             2 |         61 |           933 |                      587
 18 |             2 |        107 |           933 |                      694
 19 |             2 |        127 |           933 |                      821
 20 |             2 |        112 |           933 |                      933
(20 rows)

Now. Previously it would be difficult to calculate rolling average. For example something like this:

SELECT
    id,
    some_grouping,
    some_value,
    (
        lag(some_value, 1) over (grouping)
        +
        some_value
        +
        lead(some_value, 1) over (grouping)
    ) / 3 as rolling_average
FROM
    test
WINDOW grouping AS (
        PARTITION BY some_grouping
        ORDER BY id
)
ORDER BY some_grouping, id;

It returns, more or less as expected:

 id | some_grouping | some_value | rolling_average
----+---------------+------------+-----------------
  1 |             1 |         94 |          [null]
  2 |             1 |         86 |              85
  3 |             1 |         77 |              83
  4 |             1 |         86 |             103
  5 |             1 |        146 |             125
  6 |             1 |        145 |             119
  7 |             1 |         68 |              94
  8 |             1 |         71 |              74
  9 |             1 |         84 |              86
 10 |             1 |        105 |          [null]
 11 |             2 |        126 |          [null]
 12 |             2 |         88 |             101
 13 |             2 |         90 |              76
 14 |             2 |         50 |              81
 15 |             2 |        104 |              74
 16 |             2 |         68 |              77
 17 |             2 |         61 |              78
 18 |             2 |        107 |              98
 19 |             2 |        127 |             115
 20 |             2 |        112 |          [null]
(20 rows)

As you can see first and last row for any given window don't have rolling average – i.e. they have nulls. That's because for first row lag(some_value) is null, and for last row – lead(some_value) is null.

Now, let's see how it works with this new patch by Hitoshi Harada:

SELECT
    id,
    some_grouping,
    some_value,
    cast(
        avg(some_value) over (
            grouping
            rows between 1 preceding and 1 following
        )
        as int4
    ) as rolling_average
FROM
    test
WINDOW grouping AS (
        PARTITION BY some_grouping
        ORDER BY id
)
ORDER BY some_grouping, id;

and result is:

 id | some_grouping | some_value | rolling_average
----+---------------+------------+-----------------
  1 |             1 |         94 |              90
  2 |             1 |         86 |              86
  3 |             1 |         77 |              83
  4 |             1 |         86 |             103
  5 |             1 |        146 |             126
  6 |             1 |        145 |             120
  7 |             1 |         68 |              95
  8 |             1 |         71 |              74
  9 |             1 |         84 |              87
 10 |             1 |        105 |              95
 11 |             2 |        126 |             107
 12 |             2 |         88 |             101
 13 |             2 |         90 |              76
 14 |             2 |         50 |              81
 15 |             2 |        104 |              74
 16 |             2 |         68 |              78
 17 |             2 |         61 |              79
 18 |             2 |        107 |              98
 19 |             2 |        127 |             115
 20 |             2 |        112 |             120
(20 rows)

This is very cool.

While I can write query without using this new frames that will calculate correct values for first and last rows:

SELECT
    id,
    some_grouping,
    some_value,
    (
        coalesce( lag(some_value, 1) over (grouping), 0 )
        +
        some_value
        +
        coalesce( lead(some_value, 1) over (grouping), 0 )
    ) / (
        1
        +
        case when lag(some_value, 1) over (grouping) is null then 0 else 1 end
        +
        case when lead(some_value, 1) over (grouping) is null then 0 else 1 end
    ) as rolling_average
FROM
    test
WINDOW grouping AS (
        PARTITION BY some_grouping
        ORDER BY id
)
ORDER BY some_grouping, id;

It definitely isn't nice, and would very quickly become nightmarish for ranges larger than 3 rows.

Of course the patch also added other options besides “BETWEEN n PRECEDING AND n FOLLOWING" – you can check full list of possible framings in dev docs.

Leave a comment