Filling the gaps with window functions

Couple of days ago I had a problem that I couldn't solve after ~ 2 hours, and decided to ask on IRC. Almost immediately after asking, I figured out the solution, but David asked me to write about the solution, even though it's now (for me) completely obvious.

The problem was like this:

I had two tables, with very simple structure: event_when timestamptz, event_count int4, and wanted to show it as a single recordset with columns: event_when, event_count_a, event_count_b, but the problem was that event_when usually didn't match. Here is an example:

$ CREATE TABLE events_a (event_when timestamptz, event_count int4);
CREATE
 
$ CREATE TABLE events_b (event_when timestamptz, event_count int4);
CREATE
 
$ INSERT INTO events_a (event_when, event_count) SELECT now() - '10 minutes'::INTERVAL * random(), random() * 10 FROM generate_series(1,20);
INSERT 0 20
 
$ INSERT INTO events_b (event_when, event_count) SELECT now() - '10 minutes'::INTERVAL * random(), random() * 10 FROM generate_series(1,10);
INSERT 0 10

Of course tables that I had problem with were much larger, but this example is good enough to show the problem.

Data might (depending on when you'll run it, and your random number generator) look like this:

$ SELECT * FROM events_a;
          event_when           │ event_count
───────────────────────────────┼─────────────
 2012-08-27 23:52:03.252017+02 │           9
 2012-08-27 23:59:16.886855+02 │           8
 2012-08-27 23:52:58.926927+02 │           6
 2012-08-27 23:53:36.57738+02  │           6
 2012-08-27 23:54:58.458541+02 │           1
 2012-08-27 23:51:30.54373+02  │           3
 2012-08-27 23:52:53.981979+02 │          10
 2012-08-27 23:51:56.947008+02 │           9
 2012-08-27 23:53:42.443812+02 │           9
 2012-08-28 00:00:13.417665+02 │           2
 2012-08-27 23:57:51.333133+02 │           4
 2012-08-27 23:56:15.77672+02  │           9
 2012-08-27 23:56:22.940718+02 │           3
 2012-08-28 00:00:50.668566+02 │           5
 2012-08-27 23:59:56.927112+02 │           9
 2012-08-27 23:58:32.724896+02 │           1
 2012-08-27 23:52:54.833586+02 │           5
 2012-08-27 23:52:29.364645+02 │           7
 2012-08-28 00:00:37.243976+02 │           6
 2012-08-27 23:58:55.074473+02 │           7
(20 ROWS)
 
$ SELECT * FROM events_b;
          event_when           │ event_count
───────────────────────────────┼─────────────
 2012-08-27 23:54:14.985946+02 │           2
 2012-08-28 00:01:22.990129+02 │           6
 2012-08-27 23:59:23.374828+02 │           9
 2012-08-27 23:56:24.995275+02 │          10
 2012-08-27 23:52:59.19678+02  │           6
 2012-08-27 23:59:30.517138+02 │           2
 2012-08-27 23:51:43.406024+02 │           7
 2012-08-28 00:00:06.504544+02 │           5
 2012-08-27 23:51:34.098968+02 │           2
 2012-08-28 00:01:22.487454+02 │           1
(10 ROWS)

New records are inserted in events tables when count changes, and due to how it happens the changes are usually pretty random.

It is possible that we will have records with the same event_when in both tables, but it's not really likely, but within single table, event_when is unique.

So. Simplest possible approach:

$ SELECT
    COALESCE(a.event_when, b.event_when) AS event_when,
    a.event_count AS count_a,
    b.event_count AS count_b
FROM
    events_a a
    FULL OUTER JOIN
    events_b b USING ( event_when )
ORDER BY
    event_when;

Produces sane, but not satisfactory, result:

          event_when           │ count_a │ count_b
───────────────────────────────┼─────────┼─────────
 2012-08-27 23:51:30.54373+02  │       3[NULL]
 2012-08-27 23:51:34.098968+02 │  [NULL]2
 2012-08-27 23:51:43.406024+02 │  [NULL]7
 2012-08-27 23:51:56.947008+02 │       9[NULL]
 2012-08-27 23:52:03.252017+02 │       9[NULL]
 2012-08-27 23:52:29.364645+02 │       7[NULL]
 2012-08-27 23:52:53.981979+02 │      10[NULL]
 2012-08-27 23:52:54.833586+02 │       5[NULL]
 2012-08-27 23:52:58.926927+02 │       6[NULL]
 2012-08-27 23:52:59.19678+02  │  [NULL]6
 2012-08-27 23:53:36.57738+02  │       6[NULL]
 2012-08-27 23:53:42.443812+02 │       9[NULL]
 2012-08-27 23:54:14.985946+02 │  [NULL]2
 2012-08-27 23:54:58.458541+02 │       1[NULL]
 2012-08-27 23:56:15.77672+02  │       9[NULL]
 2012-08-27 23:56:22.940718+02 │       3[NULL]
 2012-08-27 23:56:24.995275+02 │  [NULL]10
 2012-08-27 23:57:51.333133+02 │       4[NULL]
 2012-08-27 23:58:32.724896+02 │       1[NULL]
 2012-08-27 23:58:55.074473+02 │       7[NULL]
 2012-08-27 23:59:16.886855+02 │       8[NULL]
 2012-08-27 23:59:23.374828+02 │  [NULL]9
 2012-08-27 23:59:30.517138+02 │  [NULL]2
 2012-08-27 23:59:56.927112+02 │       9[NULL]
 2012-08-28 00:00:06.504544+02 │  [NULL]5
 2012-08-28 00:00:13.417665+02 │       2[NULL]
 2012-08-28 00:00:37.243976+02 │       6[NULL]
 2012-08-28 00:00:50.668566+02 │       5[NULL]
 2012-08-28 00:01:22.487454+02 │  [NULL]1
 2012-08-28 00:01:22.990129+02 │  [NULL]6
(30 ROWS)

(my psql shows NULL values as [null] string, and not empty one).

Since records happen on change, for the whole time between records value is as in last record before.

For example – let's look at count_a – in records #2 and #3 (23:51:34.098968+02 and 23:51:43.406024+02) it's value should be 3 because it is last known event_count.

Of course we don't know value for count_b in the first record, but that's OK – we just should fill whatever is posible.

So. I spent some time on this problem, and couldn't find solution. But then, after describing it on IRC, I realized that it was closely related to this problem.

With this realization, solution became obvious. First, I'll add new columns to data, that shows event_when if there are data for given count_ column. Easier to show than explain:

$ WITH base_data AS (
    SELECT
        COALESCE(a.event_when, b.event_when) AS event_when,
        a.event_count AS count_a,
        b.event_count AS count_b
    FROM
        events_a a
        FULL OUTER JOIN
        events_b b USING ( event_when )
    )
, with_group_starts AS (
    SELECT
        *,
        CASE WHEN count_a IS NULL THEN NULL ELSE event_when END AS group_a,
        CASE WHEN count_b IS NULL THEN NULL ELSE event_when END AS group_b
    FROM
        base_data
)
SELECT *
FROM with_group_starts
ORDER BY event_when;
          event_when           │ count_a │ count_b │            group_a            │            group_b
───────────────────────────────┼─────────┼─────────┼───────────────────────────────┼───────────────────────────────
 2012-08-27 23:51:30.54373+02  │       3[NULL]2012-08-27 23:51:30.54373+02  │ [NULL]
 2012-08-27 23:51:34.098968+02 │  [NULL]2[NULL]2012-08-27 23:51:34.098968+02
 2012-08-27 23:51:43.406024+02 │  [NULL]7[NULL]2012-08-27 23:51:43.406024+02
 2012-08-27 23:51:56.947008+02 │       9[NULL]2012-08-27 23:51:56.947008+02 │ [NULL]
 2012-08-27 23:52:03.252017+02 │       9[NULL]2012-08-27 23:52:03.252017+02 │ [NULL]
 2012-08-27 23:52:29.364645+02 │       7[NULL]2012-08-27 23:52:29.364645+02 │ [NULL]
 2012-08-27 23:52:53.981979+02 │      10[NULL]2012-08-27 23:52:53.981979+02 │ [NULL]
 2012-08-27 23:52:54.833586+02 │       5[NULL]2012-08-27 23:52:54.833586+02 │ [NULL]
 2012-08-27 23:52:58.926927+02 │       6[NULL]2012-08-27 23:52:58.926927+02 │ [NULL]
 2012-08-27 23:52:59.19678+02  │  [NULL]6[NULL]2012-08-27 23:52:59.19678+02
 2012-08-27 23:53:36.57738+02  │       6[NULL]2012-08-27 23:53:36.57738+02  │ [NULL]
 2012-08-27 23:53:42.443812+02 │       9[NULL]2012-08-27 23:53:42.443812+02 │ [NULL]
 2012-08-27 23:54:14.985946+02 │  [NULL]2[NULL]2012-08-27 23:54:14.985946+02
 2012-08-27 23:54:58.458541+02 │       1[NULL]2012-08-27 23:54:58.458541+02 │ [NULL]
 2012-08-27 23:56:15.77672+02  │       9[NULL]2012-08-27 23:56:15.77672+02  │ [NULL]
 2012-08-27 23:56:22.940718+02 │       3[NULL]2012-08-27 23:56:22.940718+02 │ [NULL]
 2012-08-27 23:56:24.995275+02 │  [NULL]10[NULL]2012-08-27 23:56:24.995275+02
 2012-08-27 23:57:51.333133+02 │       4[NULL]2012-08-27 23:57:51.333133+02 │ [NULL]
 2012-08-27 23:58:32.724896+02 │       1[NULL]2012-08-27 23:58:32.724896+02 │ [NULL]
 2012-08-27 23:58:55.074473+02 │       7[NULL]2012-08-27 23:58:55.074473+02 │ [NULL]
 2012-08-27 23:59:16.886855+02 │       8[NULL]2012-08-27 23:59:16.886855+02 │ [NULL]
 2012-08-27 23:59:23.374828+02 │  [NULL]9[NULL]2012-08-27 23:59:23.374828+02
 2012-08-27 23:59:30.517138+02 │  [NULL]2[NULL]2012-08-27 23:59:30.517138+02
 2012-08-27 23:59:56.927112+02 │       9[NULL]2012-08-27 23:59:56.927112+02 │ [NULL]
 2012-08-28 00:00:06.504544+02 │  [NULL]5[NULL]2012-08-28 00:00:06.504544+02
 2012-08-28 00:00:13.417665+02 │       2[NULL]2012-08-28 00:00:13.417665+02 │ [NULL]
 2012-08-28 00:00:37.243976+02 │       6[NULL]2012-08-28 00:00:37.243976+02 │ [NULL]
 2012-08-28 00:00:50.668566+02 │       5[NULL]2012-08-28 00:00:50.668566+02 │ [NULL]
 2012-08-28 00:01:22.487454+02 │  [NULL]1[NULL]2012-08-28 00:01:22.487454+02
 2012-08-28 00:01:22.990129+02 │  [NULL]6[NULL]2012-08-28 00:01:22.990129+02
(30 ROWS)

Benefit of these columns is very simple – values in them are rising monotonically, so I can use now max() to produce actual groups:

$ WITH base_data AS (
    SELECT
        COALESCE(a.event_when, b.event_when) AS event_when,
        a.event_count AS count_a,
        b.event_count AS count_b
    FROM
        events_a a
        FULL OUTER JOIN
        events_b b USING ( event_when )
    )
, with_group_starts AS (
    SELECT
        *,
        MAX(CASE WHEN count_a IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_a,
        MAX(CASE WHEN count_b IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_b
    FROM
        base_data
)
SELECT *
FROM with_group_starts
ORDER BY event_when;
          event_when           │ count_a │ count_b │            group_a            │            group_b
───────────────────────────────┼─────────┼─────────┼───────────────────────────────┼───────────────────────────────
 2012-08-27 23:51:30.54373+02  │       3[NULL]2012-08-27 23:51:30.54373+02  │ [NULL]
 2012-08-27 23:51:34.098968+02 │  [NULL]22012-08-27 23:51:30.54373+02  │ 2012-08-27 23:51:34.098968+02
 2012-08-27 23:51:43.406024+02 │  [NULL]72012-08-27 23:51:30.54373+02  │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:51:56.947008+02 │       9[NULL]2012-08-27 23:51:56.947008+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:03.252017+02 │       9[NULL]2012-08-27 23:52:03.252017+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:29.364645+02 │       7[NULL]2012-08-27 23:52:29.364645+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:53.981979+02 │      10[NULL]2012-08-27 23:52:53.981979+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:54.833586+02 │       5[NULL]2012-08-27 23:52:54.833586+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:58.926927+02 │       6[NULL]2012-08-27 23:52:58.926927+02 │ 2012-08-27 23:51:43.406024+02
 2012-08-27 23:52:59.19678+02  │  [NULL]62012-08-27 23:52:58.926927+02 │ 2012-08-27 23:52:59.19678+02
 2012-08-27 23:53:36.57738+02  │       6[NULL]2012-08-27 23:53:36.57738+02  │ 2012-08-27 23:52:59.19678+02
 2012-08-27 23:53:42.443812+02 │       9[NULL]2012-08-27 23:53:42.443812+02 │ 2012-08-27 23:52:59.19678+02
 2012-08-27 23:54:14.985946+02 │  [NULL]22012-08-27 23:53:42.443812+02 │ 2012-08-27 23:54:14.985946+02
 2012-08-27 23:54:58.458541+02 │       1[NULL]2012-08-27 23:54:58.458541+02 │ 2012-08-27 23:54:14.985946+02
 2012-08-27 23:56:15.77672+02  │       9[NULL]2012-08-27 23:56:15.77672+02  │ 2012-08-27 23:54:14.985946+02
 2012-08-27 23:56:22.940718+02 │       3[NULL]2012-08-27 23:56:22.940718+02 │ 2012-08-27 23:54:14.985946+02
 2012-08-27 23:56:24.995275+02 │  [NULL]102012-08-27 23:56:22.940718+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:57:51.333133+02 │       4[NULL]2012-08-27 23:57:51.333133+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:58:32.724896+02 │       1[NULL]2012-08-27 23:58:32.724896+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:58:55.074473+02 │       7[NULL]2012-08-27 23:58:55.074473+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:59:16.886855+02 │       8[NULL]2012-08-27 23:59:16.886855+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:59:23.374828+02 │  [NULL]92012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:23.374828+02
 2012-08-27 23:59:30.517138+02 │  [NULL]22012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:30.517138+02
 2012-08-27 23:59:56.927112+02 │       9[NULL]2012-08-27 23:59:56.927112+02 │ 2012-08-27 23:59:30.517138+02
 2012-08-28 00:00:06.504544+02 │  [NULL]52012-08-27 23:59:56.927112+02 │ 2012-08-28 00:00:06.504544+02
 2012-08-28 00:00:13.417665+02 │       2[NULL]2012-08-28 00:00:13.417665+02 │ 2012-08-28 00:00:06.504544+02
 2012-08-28 00:00:37.243976+02 │       6[NULL]2012-08-28 00:00:37.243976+02 │ 2012-08-28 00:00:06.504544+02
 2012-08-28 00:00:50.668566+02 │       5[NULL]2012-08-28 00:00:50.668566+02 │ 2012-08-28 00:00:06.504544+02
 2012-08-28 00:01:22.487454+02 │  [NULL]12012-08-28 00:00:50.668566+02 │ 2012-08-28 00:01:22.487454+02
 2012-08-28 00:01:22.990129+02 │  [NULL]62012-08-28 00:00:50.668566+02 │ 2012-08-28 00:01:22.990129+02
(30 ROWS)

Please note that values in group_a and group_b partition data in the resultset into parts which always have only 1 row with value, and possibly multiple rows with null.

For example – all rows with group_a = ‘2012-08-27 23:59:16.886855+02':

 2012-08-27 23:59:16.886855+02 │       8[NULL]2012-08-27 23:59:16.886855+02 │ 2012-08-27 23:56:24.995275+02
 2012-08-27 23:59:23.374828+02 │  [NULL]92012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:23.374828+02
 2012-08-27 23:59:30.517138+02 │  [NULL]22012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:30.517138+02

Only first of these rows has value in count_a.

So, now we can, using this group/partition info, fill-in the blanks:

$ WITH base_data AS (
    SELECT
        COALESCE(a.event_when, b.event_when) AS event_when,
        a.event_count AS count_a,
        b.event_count AS count_b
    FROM
        events_a a
        FULL OUTER JOIN
        events_b b USING ( event_when )
    )
, with_group_starts AS (
    SELECT
        *,
        MAX(CASE WHEN count_a IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_a,
        MAX(CASE WHEN count_b IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_b
    FROM
        base_data
), filled_in AS (
    SELECT
        *,
        MAX(count_a) OVER (partition BY group_a) AS real_count_a,
        MAX(count_b) OVER (partition BY group_b) AS real_count_b
    FROM
        with_group_starts
)
SELECT *
FROM filled_in
ORDER BY event_when;
          event_when           │ count_a │ count_b │            group_a            │            group_b            │ real_count_a │ real_count_b
───────────────────────────────┼─────────┼─────────┼───────────────────────────────┼───────────────────────────────┼──────────────┼──────────────
 2012-08-27 23:51:30.54373+02  │       3[NULL]2012-08-27 23:51:30.54373+02  │ [NULL]3[NULL]
 2012-08-27 23:51:34.098968+02 │  [NULL]22012-08-27 23:51:30.54373+02  │ 2012-08-27 23:51:34.098968+02 │            32
 2012-08-27 23:51:43.406024+02 │  [NULL]72012-08-27 23:51:30.54373+02  │ 2012-08-27 23:51:43.406024+02 │            37
 2012-08-27 23:51:56.947008+02 │       9[NULL]2012-08-27 23:51:56.947008+02 │ 2012-08-27 23:51:43.406024+02 │            97
 2012-08-27 23:52:03.252017+02 │       9[NULL]2012-08-27 23:52:03.252017+02 │ 2012-08-27 23:51:43.406024+02 │            97
 2012-08-27 23:52:29.364645+02 │       7[NULL]2012-08-27 23:52:29.364645+02 │ 2012-08-27 23:51:43.406024+02 │            77
 2012-08-27 23:52:53.981979+02 │      10[NULL]2012-08-27 23:52:53.981979+02 │ 2012-08-27 23:51:43.406024+02 │           107
 2012-08-27 23:52:54.833586+02 │       5[NULL]2012-08-27 23:52:54.833586+02 │ 2012-08-27 23:51:43.406024+02 │            57
 2012-08-27 23:52:58.926927+02 │       6[NULL]2012-08-27 23:52:58.926927+02 │ 2012-08-27 23:51:43.406024+02 │            67
 2012-08-27 23:52:59.19678+02  │  [NULL]62012-08-27 23:52:58.926927+02 │ 2012-08-27 23:52:59.19678+02  │            66
 2012-08-27 23:53:36.57738+02  │       6[NULL]2012-08-27 23:53:36.57738+02  │ 2012-08-27 23:52:59.19678+02  │            66
 2012-08-27 23:53:42.443812+02 │       9[NULL]2012-08-27 23:53:42.443812+02 │ 2012-08-27 23:52:59.19678+02  │            96
 2012-08-27 23:54:14.985946+02 │  [NULL]22012-08-27 23:53:42.443812+02 │ 2012-08-27 23:54:14.985946+02 │            92
 2012-08-27 23:54:58.458541+02 │       1[NULL]2012-08-27 23:54:58.458541+02 │ 2012-08-27 23:54:14.985946+02 │            12
 2012-08-27 23:56:15.77672+02  │       9[NULL]2012-08-27 23:56:15.77672+02  │ 2012-08-27 23:54:14.985946+02 │            92
 2012-08-27 23:56:22.940718+02 │       3[NULL]2012-08-27 23:56:22.940718+02 │ 2012-08-27 23:54:14.985946+02 │            32
 2012-08-27 23:56:24.995275+02 │  [NULL]102012-08-27 23:56:22.940718+02 │ 2012-08-27 23:56:24.995275+02 │            310
 2012-08-27 23:57:51.333133+02 │       4[NULL]2012-08-27 23:57:51.333133+02 │ 2012-08-27 23:56:24.995275+02 │            410
 2012-08-27 23:58:32.724896+02 │       1[NULL]2012-08-27 23:58:32.724896+02 │ 2012-08-27 23:56:24.995275+02 │            110
 2012-08-27 23:58:55.074473+02 │       7[NULL]2012-08-27 23:58:55.074473+02 │ 2012-08-27 23:56:24.995275+02 │            710
 2012-08-27 23:59:16.886855+02 │       8[NULL]2012-08-27 23:59:16.886855+02 │ 2012-08-27 23:56:24.995275+02 │            810
 2012-08-27 23:59:23.374828+02 │  [NULL]92012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:23.374828+02 │            89
 2012-08-27 23:59:30.517138+02 │  [NULL]22012-08-27 23:59:16.886855+02 │ 2012-08-27 23:59:30.517138+02 │            82
 2012-08-27 23:59:56.927112+02 │       9[NULL]2012-08-27 23:59:56.927112+02 │ 2012-08-27 23:59:30.517138+02 │            92
 2012-08-28 00:00:06.504544+02 │  [NULL]52012-08-27 23:59:56.927112+02 │ 2012-08-28 00:00:06.504544+02 │            95
 2012-08-28 00:00:13.417665+02 │       2[NULL]2012-08-28 00:00:13.417665+02 │ 2012-08-28 00:00:06.504544+02 │            25
 2012-08-28 00:00:37.243976+02 │       6[NULL]2012-08-28 00:00:37.243976+02 │ 2012-08-28 00:00:06.504544+02 │            65
 2012-08-28 00:00:50.668566+02 │       5[NULL]2012-08-28 00:00:50.668566+02 │ 2012-08-28 00:00:06.504544+02 │            55
 2012-08-28 00:01:22.487454+02 │  [NULL]12012-08-28 00:00:50.668566+02 │ 2012-08-28 00:01:22.487454+02 │            51
 2012-08-28 00:01:22.990129+02 │  [NULL]62012-08-28 00:00:50.668566+02 │ 2012-08-28 00:01:22.990129+02 │            56
(30 ROWS)

Of course this is too much information, so I can simplify the last bits of query to:

$ WITH base_data AS (
    SELECT
        COALESCE(a.event_when, b.event_when) AS event_when,
        a.event_count AS count_a,
        b.event_count AS count_b
    FROM
        events_a a
        FULL OUTER JOIN
        events_b b USING ( event_when )
    )
, with_group_starts AS (
    SELECT
        *,
        MAX(CASE WHEN count_a IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_a,
        MAX(CASE WHEN count_b IS NULL THEN NULL ELSE event_when END) OVER (ORDER BY event_when) AS group_b
    FROM
        base_data
)
SELECT
    event_when,
    MAX(count_a) OVER (partition BY group_a) AS real_count_a,
    MAX(count_b) OVER (partition BY group_b) AS real_count_b
FROM
    with_group_starts
ORDER BY event_when;
          event_when           │ real_count_a │ real_count_b
───────────────────────────────┼──────────────┼──────────────
 2012-08-27 23:51:30.54373+02  │            3[NULL]
 2012-08-27 23:51:34.098968+02 │            32
 2012-08-27 23:51:43.406024+02 │            37
 2012-08-27 23:51:56.947008+02 │            97
 2012-08-27 23:52:03.252017+02 │            97
 2012-08-27 23:52:29.364645+02 │            77
 2012-08-27 23:52:53.981979+02 │           107
 2012-08-27 23:52:54.833586+02 │            57
 2012-08-27 23:52:58.926927+02 │            67
 2012-08-27 23:52:59.19678+02  │            66
 2012-08-27 23:53:36.57738+02  │            66
 2012-08-27 23:53:42.443812+02 │            96
 2012-08-27 23:54:14.985946+02 │            92
 2012-08-27 23:54:58.458541+02 │            12
 2012-08-27 23:56:15.77672+02  │            92
 2012-08-27 23:56:22.940718+02 │            32
 2012-08-27 23:56:24.995275+02 │            310
 2012-08-27 23:57:51.333133+02 │            410
 2012-08-27 23:58:32.724896+02 │            110
 2012-08-27 23:58:55.074473+02 │            710
 2012-08-27 23:59:16.886855+02 │            810
 2012-08-27 23:59:23.374828+02 │            89
 2012-08-27 23:59:30.517138+02 │            82
 2012-08-27 23:59:56.927112+02 │            92
 2012-08-28 00:00:06.504544+02 │            95
 2012-08-28 00:00:13.417665+02 │            25
 2012-08-28 00:00:37.243976+02 │            65
 2012-08-28 00:00:50.668566+02 │            55
 2012-08-28 00:01:22.487454+02 │            51
 2012-08-28 00:01:22.990129+02 │            56
(30 ROWS)

And that's it. It would be simpler if we had last() window function (returning last, not null, value for given column in given window), but as you can see, it's doable with what we have available.

3 thoughts on “Filling the gaps with window functions”

  1. You can simplify a little bit: both case expression try to reconstruct the “event_when” before the “full outer join”. Simply add both “event_when” in “base_data” and refer to that instead of the “cases” expressions:

    with base_data as (
    select
    coalesce(a.event_when, b.event_when) as event_when,
    a.event_count as count_a,
    b.event_count as count_b,
    a.event_when as event_a,
    b.event_when as event_b
    from
    events_a a full outer join events_b b using ( event_when )
    ),
    group_starts as (
    select
    *,
    max(event_a) over (ORDER BY event_when) as group_a,
    max(event_b) over (ORDER BY event_when) as group_b
    from base_data
    )
    select
    event_when,
    max(count_a) over (partition by group_a) as real_count_a,
    max(count_b) over (partition by group_b) as real_count_b
    from group_starts

  2. I usually try to avoid subselects, but in this case I think the speed might not be bad especially if event_when is unique in each table. So I’d just do:

    WITH e AS (SELECT event_when
    FROM events_a UNION SELECT event_when FROM events_b)
    SELECT
    e.event_when
    ,(SELECT c.event_count FROM events_a As c
    WHERE c.event_when <= e.event_when
    ORDER BY c.event_when DESC LIMIT 1) As count_a
    ,(SELECT c.event_count FROM events_b As c
    WHERE c.event_when <= e.event_when
    ORDER BY c.event_when DESC LIMIT 1) As count_b
    FROM
    e
    ORDER BY
    e.event_when;

Comments are closed.