October 29th, 2009 by depesz | Tags: , , , , , | 4 comments »
Did it help? If yes - maybe you can help me?

Yesterday on my favorite IRC channel fooqux asked interesting question. I took some more questions, and here is problem description:

We have a system which, every 5 minutes, takes a number of tasks to be done. Tasks are uniform. Within 5 minutes we can handle at most 100 tasks. Given the history of number of tasks added every 5 minutes, calculate backlog at any given moment.

Did you understand the problem? Well – I didn't. So, let's see the data, and expected output.

So. Let's see the data. I generated some random data, and here is how it looks:

```# select * from test order by stamp limit 20; stamp │ new_events ─────────────────────┼──────────── 2000-01-01 00:05:00 │ 15 2000-01-01 00:10:00 │ 46 2000-01-01 00:15:00 │ 57 2000-01-01 00:20:00 │ 19 2000-01-01 00:25:00 │ 132 2000-01-01 00:30:00 │ 14 2000-01-01 00:35:00 │ 19 2000-01-01 00:40:00 │ 137 2000-01-01 00:45:00 │ 14 2000-01-01 00:50:00 │ 146 2000-01-01 00:55:00 │ 83 2000-01-01 01:00:00 │ 72 2000-01-01 01:05:00 │ 43 2000-01-01 01:10:00 │ 75 2000-01-01 01:15:00 │ 74 2000-01-01 01:20:00 │ 62 2000-01-01 01:25:00 │ 134 2000-01-01 01:30:00 │ 87 2000-01-01 01:35:00 │ 59 2000-01-01 01:40:00 │ 103 (20 rows)```

Nothing really fancy here (side info: check those nice, Unicode frames in new psql 🙂 will be writing about them some time in future).

So, the idea is that every 5 minutes (every, there are no gaps) new number arrives. And this is number of items to be processed. And every 5 minutes we can process 100 tasks, which is fine when we get 15 new tasks, but when system gets 150 tasks, and it will do only 100 of them in its 5 minute period of tim – it will backlog.

Every backlog calculation has to start from some specific point in time, where we know it was 0. In here – we know that on ‘2000-01-01 00:05:00' it was 0 (because there were no events before).

Also, on every time frame we have to calculate how many tasks could be processed, and how many will have to be added or subtracted from backlog.

So, let's add first layer of calculation:

```# select *, new_events - 100 as backlog_change from test order by stamp limit 20; stamp │ new_events │ backlog_change ─────────────────────┼────────────┼──────────────── 2000-01-01 00:05:00 │ 15 │ -85 2000-01-01 00:10:00 │ 46 │ -54 2000-01-01 00:15:00 │ 57 │ -43 2000-01-01 00:20:00 │ 19 │ -81 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ -86 2000-01-01 00:35:00 │ 19 │ -81 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ -86 2000-01-01 00:50:00 │ 146 │ 46 2000-01-01 00:55:00 │ 83 │ -17 2000-01-01 01:00:00 │ 72 │ -28 2000-01-01 01:05:00 │ 43 │ -57 2000-01-01 01:10:00 │ 75 │ -25 2000-01-01 01:15:00 │ 74 │ -26 2000-01-01 01:20:00 │ 62 │ -38 2000-01-01 01:25:00 │ 134 │ 34 2000-01-01 01:30:00 │ 87 │ -13 2000-01-01 01:35:00 │ 59 │ -41 2000-01-01 01:40:00 │ 103 │ 3 (20 rows)```

Of course backlog cannot be less than zero – you can't have -1 tasks to be taken care of.

So, how can we deal with it? With help of Common Table Expressions, which were added in PostgreSQL 8.4.

Now we know what to use. But how?

Recursive CTE is not easy to comprehend at first, but with some trial and error we can grasp it.

First of all – Recursive CTE references itself. So, we need to start with something that it will be able to reference to.

First query is trivial:

```# WITH backlog_info AS ( SELECT *, new_events - 100 as backlog FROM test WHERE stamp = '2000-01-01 00:05:00' ) SELECT * FROM backlog_info ORDER BY stamp ASC LIMIT 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ -85 (1 row)```

I hope the code is clear. Couple of important points:

• WHERE stamp = ‘…' is to select single row that we will start from. It should be that before it backlog is 0
• ORDER BY and LIMIT – it's obviously not necessary now, since we have only 1 row to return, but let's have it from start – in next queries we will change only the WITH() part.
• There is obvious bug – as I wrote above backlog cannot be < 0. It's shouldn't be possible.

So, given the bug, we have to fix it. It's fortunately, relatively simple:

```# WITH backlog_info AS ( SELECT *, greatest( new_events - 100, 0 ) as backlog FROM test WHERE stamp = '2000-01-01 00:05:00' ) SELECT * FROM backlog_info ORDER BY stamp ASC LIMIT 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 (1 row)```

Just to remind: greatest( X, Y ) will return greater of these numbers. So if calculated backlog will be > 0 – it will be returned. If it will be < 0 – 0 will be greater, and backlog will be 0. Hope it's clear 🙂

Now. We have our first row ready. What about others?

In next rows we should reference previous one. Luckily it's very easy:

```WITH RECURSIVE backlog_info AS ( SELECT *, greatest( new_events - 100, 0 ) as backlog FROM test WHERE stamp = '2000-01-01 00:05:00' UNION ALL SELECT t.*, greatest( pt.backlog + t.new_events - 100, 0 ) as backlog FROM test t, backlog_info pt WHERE t.stamp = pt.stamp + '5 minutes'::INTERVAL ) SELECT * FROM backlog_info ORDER BY stamp ASC LIMIT 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 2000-01-01 00:10:00 │ 46 │ 0 2000-01-01 00:15:00 │ 57 │ 0 2000-01-01 00:20:00 │ 19 │ 0 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ 0 2000-01-01 00:35:00 │ 19 │ 0 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ 0 2000-01-01 00:50:00 │ 146 │ 46 2000-01-01 00:55:00 │ 83 │ 29 2000-01-01 01:00:00 │ 72 │ 1 2000-01-01 01:05:00 │ 43 │ 0 2000-01-01 01:10:00 │ 75 │ 0 2000-01-01 01:15:00 │ 74 │ 0 2000-01-01 01:20:00 │ 62 │ 0 2000-01-01 01:25:00 │ 134 │ 34 2000-01-01 01:30:00 │ 87 │ 21 2000-01-01 01:35:00 │ 59 │ 0 2000-01-01 01:40:00 │ 103 │ 3 (20 rows)```

Nice. It works. But am I happy with it? well. Not really.

There are still 2 problems.

First is more psychological: what will happen if there will be no row for given every-5-minute point? Let's see:

```# delete from test where stamp = '2000-01-01 00:55:00';   # WITH RECURSIVE backlog_info AS ( SELECT *, greatest( new_events - 100, 0 ) as backlog FROM test WHERE stamp = '2000-01-01 00:05:00' UNION ALL SELECT t.*, greatest( pt.backlog + t.new_events - 100, 0 ) as backlog FROM test t, backlog_info pt WHERE t.stamp = pt.stamp + '5 minutes'::INTERVAL ) SELECT * FROM backlog_info ORDER BY stamp ASC LIMIT 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 2000-01-01 00:10:00 │ 46 │ 0 2000-01-01 00:15:00 │ 57 │ 0 2000-01-01 00:20:00 │ 19 │ 0 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ 0 2000-01-01 00:35:00 │ 19 │ 0 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ 0 2000-01-01 00:50:00 │ 146 │ 46 (10 rows)```

Quite bad, isn't it?

And 2nd. problem. Let's take a look at explain analyze output:

``` QUERY PLAN ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Limit (cost=219.75..219.80 rows=20 width=16) (actual time=2765.042..2765.115 rows=20 loops=1) CTE backlog_info -> Recursive Union (cost=0.00..215.04 rows=101 width=16) (actual time=0.190..2760.145 rows=1000 loops=1) -> Index Scan using some_idx on test (cost=0.00..8.27 rows=1 width=12) (actual time=0.183..0.187 rows=1 loops=1) Index Cond: (stamp = '2000-01-01 00:05:00'::timestamp without time zone) -> Hash Join (cost=0.33..20.48 rows=10 width=16) (actual time=1.367..2.754 rows=1 loops=1000) Hash Cond: (t.stamp = (pt.stamp + '00:05:00'::interval)) -> Seq Scan on test t (cost=0.00..15.00 rows=1000 width=12) (actual time=0.006..1.353 rows=1000 loops=1000) -> Hash (cost=0.20..0.20 rows=10 width=12) (actual time=0.008..0.008 rows=1 loops=1000) -> WorkTable Scan on backlog_info pt (cost=0.00..0.20 rows=10 width=12) (actual time=0.002..0.003 rows=1 loops=1000) -> Sort (cost=4.71..4.96 rows=101 width=16) (actual time=2765.037..2765.058 rows=20 loops=1) Sort Key: backlog_info.stamp Sort Method: top-N heapsort Memory: 17kB -> CTE Scan on backlog_info (cost=0.00..2.02 rows=101 width=16) (actual time=0.198..2763.362 rows=1000 loops=1) Total runtime: 2765.290 ms (15 rows)```

Do you see the problem? No? Check loops counts on scans. This is simply wrong.

(number 1000 comes from the fact that I have 1000 rows in there).

So, what should be the solution? It's quite simple actually – don't sweat it, write a stored procedure 🙂

```CREATE OR REPLACE FUNCTION get_events_data_with_backlog( IN from_when TIMESTAMP, OUT stamp TIMESTAMP, OUT new_events INT4, OUT backlog INT4 ) RETURNS SETOF RECORD LANGUAGE plpgsql AS \$\$ DECLARE temprec RECORD; BEGIN backlog := 0; FOR temprec IN SELECT * FROM test t WHERE t.stamp >= coalesce( from_when, '-infinity' ) ORDER BY t.stamp ASC LOOP stamp := temprec.stamp; new_events := temprec.new_events; backlog := greatest( backlog + ( new_events - 100 ), 0 ); RETURN NEXT; END LOOP; RETURN; END; \$\$;```

How does it work? Sanely:

```# select * from get_events_data_with_backlog(NULL) limit 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 2000-01-01 00:10:00 │ 46 │ 0 2000-01-01 00:15:00 │ 57 │ 0 2000-01-01 00:20:00 │ 19 │ 0 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ 0 2000-01-01 00:35:00 │ 19 │ 0 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ 0 2000-01-01 00:50:00 │ 146 │ 46 2000-01-01 00:55:00 │ 83 │ 29 2000-01-01 01:00:00 │ 72 │ 1 2000-01-01 01:05:00 │ 43 │ 0 2000-01-01 01:10:00 │ 75 │ 0 2000-01-01 01:15:00 │ 74 │ 0 2000-01-01 01:20:00 │ 62 │ 0 2000-01-01 01:25:00 │ 134 │ 34 2000-01-01 01:30:00 │ 87 │ 21 2000-01-01 01:35:00 │ 59 │ 0 2000-01-01 01:40:00 │ 103 │ 3 (20 rows)```

And what happens if I remove any row from within?

```# delete from test where stamp = '2000-01-01 00:55:00';   # select * from get_events_data_with_backlog(NULL) limit 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 2000-01-01 00:10:00 │ 46 │ 0 2000-01-01 00:15:00 │ 57 │ 0 2000-01-01 00:20:00 │ 19 │ 0 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ 0 2000-01-01 00:35:00 │ 19 │ 0 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ 0 2000-01-01 00:50:00 │ 146 │ 46 2000-01-01 01:00:00 │ 72 │ 18 2000-01-01 01:05:00 │ 43 │ 0 2000-01-01 01:10:00 │ 75 │ 0 2000-01-01 01:15:00 │ 74 │ 0 2000-01-01 01:20:00 │ 62 │ 0 2000-01-01 01:25:00 │ 134 │ 34 2000-01-01 01:30:00 │ 87 │ 21 2000-01-01 01:35:00 │ 59 │ 0 2000-01-01 01:40:00 │ 103 │ 3 2000-01-01 01:45:00 │ 107 │ 10 (20 rows)```

OK. So, there are 2 good points, and 1 bad. Good: it's faster – only one seq scan over table, and that's all. Second: it doesn't break if there is no data in middle of dataset.

Bad: Well, it returns 20 rows, but these are wrong. Backlog for ‘2000-01-01 01:00:00' should be definitely 0. System had enough time to process data from ‘2000-01-01 00:50:00' – it had 10 minutes.

So, let's fix the last bug:

```CREATE OR REPLACE FUNCTION get_events_data_with_backlog( IN from_when TIMESTAMP, OUT stamp TIMESTAMP, OUT new_events INT4, OUT backlog INT4 ) RETURNS SETOF RECORD LANGUAGE plpgsql AS \$\$ DECLARE temprec RECORD; previous_stamp TIMESTAMP; time_passed INTERVAL; intervals_passed INT4; could_process INT4; BEGIN backlog := 0; FOR temprec IN SELECT * FROM test t WHERE t.stamp >= coalesce( from_when, '-infinity' ) ORDER BY t.stamp ASC LOOP IF previous_stamp IS NULL THEN previous_stamp := temprec.stamp - '5 minutes'::INTERVAL; END IF; stamp := temprec.stamp; new_events := temprec.new_events; time_passed := temprec.stamp - previous_stamp; intervals_passed := extract(epoch FROM time_passed) / 300; could_process := intervals_passed * 100; backlog := greatest( backlog + ( new_events - could_process), 0 ); RETURN NEXT; previous_stamp := temprec.stamp; END LOOP; RETURN; END; \$\$;```

( of course calculations within function could be written in much terser format, but I wanted it to be clear what it does ).

And how does it work?

```# delete from test where stamp = '2000-01-01 00:55:00';   # select * from get_events_data_with_backlog(NULL) limit 20; stamp │ new_events │ backlog ─────────────────────┼────────────┼───────── 2000-01-01 00:05:00 │ 15 │ 0 2000-01-01 00:10:00 │ 46 │ 0 2000-01-01 00:15:00 │ 57 │ 0 2000-01-01 00:20:00 │ 19 │ 0 2000-01-01 00:25:00 │ 132 │ 32 2000-01-01 00:30:00 │ 14 │ 0 2000-01-01 00:35:00 │ 19 │ 0 2000-01-01 00:40:00 │ 137 │ 37 2000-01-01 00:45:00 │ 14 │ 0 2000-01-01 00:50:00 │ 146 │ 46 2000-01-01 01:00:00 │ 72 │ 0 2000-01-01 01:05:00 │ 43 │ 0 2000-01-01 01:10:00 │ 75 │ 0 2000-01-01 01:15:00 │ 74 │ 0 2000-01-01 01:20:00 │ 62 │ 0 2000-01-01 01:25:00 │ 134 │ 34 2000-01-01 01:30:00 │ 87 │ 21 2000-01-01 01:35:00 │ 59 │ 0 2000-01-01 01:40:00 │ 103 │ 3 2000-01-01 01:45:00 │ 107 │ 10 (20 rows)```

Nice. Everything calculated properly.

1. ### 4 comments

2. Oct 29, 2009

It’s just too bad that those “nice Unicode frames” are not stacked together in the HTML code block.

A line-height:0.5em; instead of the current 1em in style.css:75 would take care of that…

3. Oct 29, 2009

@Steve:
good point. Will look into it, but since layout is (as you can definitely see) my priority – it might take a while to get to it 🙁

4. Nov 1, 2009

To me this looks like a perfect job for a windowing function…

5. Nov 2, 2009

@Efgé:
That’s what fooqux wanted, and that’s what I originally tried to write, but it seems that I can’t write it with window functions. If you can – great. Show us – I mean – it’s perfectly possible that I overlooked something.