Tips n’ tricks – rank on changes – select * from depesz;

I got asked this: having this table:

# select * from a order by d;
 t | d  
---+----
 O |  1
 O |  2
 O |  3
 M |  4
 M |  5
 M |  6
 M |  7
 O |  8
 O |  9
 O | 10
 I | 11
 I | 12
 I | 13
(13 rows)

Is it possible to add “rank" column, that will increment whenever t changed?

Expected output looks like this:

 t | d  | magic 
---+----+-------
 O |  1 |     1
 O |  2 |     1
 O |  3 |     1
 M |  4 |     2
 M |  5 |     2
 M |  6 |     2
 M |  7 |     2
 O |  8 |     3
 O |  9 |     3
 O | 10 |     3
 I | 11 |     4
 I | 12 |     4
 I | 13 |     4
(13 rows)

Please note that value ‘O' in “t" column has 2 different values, so we can't simply rank() over (order by t).

Solution proved to be relatively simple.

First of all – I need to find out whether current row has t different from previous one. This is trivially done using:

# select
    t,
    d,
    t is distinct from lag(t) over ( order by d ) as diff
from a order by d;
 t | d  | diff 
---+----+------
 O |  1 | t
 O |  2 | f
 O |  3 | f
 M |  4 | t
 M |  5 | f
 M |  6 | f
 M |  7 | f
 O |  8 | t
 O |  9 | f
 O | 10 | f
 I | 11 | t
 I | 12 | f
 I | 13 | f
(13 rows)

As you can see column ‘diff' has value true when there is change in column ‘t', and false when no change happened.

Now, all we need to do, is count true values in diff column:

# select
    t,
    d,
    count(nullif(diff, false)) over (order by d)
from
    (
        select
            t,
            d,
            t is distinct from lag(t) over ( order by d ) as diff
        from a
    ) as x
order by d;
 t | d  | count 
---+----+-------
 O |  1 |     1
 O |  2 |     1
 O |  3 |     1
 M |  4 |     2
 M |  5 |     2
 M |  6 |     2
 M |  7 |     2
 O |  8 |     3
 O |  9 |     3
 O | 10 |     3
 I | 11 |     4
 I | 12 |     4
 I | 13 |     4
(13 rows)

Simple, and to the point. Alternative way would be using plpgsql function, which would return setof(t, d, count) and would increment count for every change, but if you can't use PL/pgSQL, or don't want to – pure SQL version is possible.

3 thoughts on “Tips n’ tricks – rank on changes”

You could roll this with a custom aggregate, too. We did something similar to count the number of consecutive t’s and reset to 0 when t changed.

Since you’re using 8.4 features, you could also throw in a CTE 🙂

WITH t1 AS (
SELECT t,d, t is distinct from lag(t) OVER (order by d) AS diff
FROM a
)
SELECT t, d, count(nullif(diff, false)) over (order by d)
FROM t1
ORDER BY d;

@David:
While I do use CTE when necessary, I am far from being entirely thrilled about them. These are performance issues with them, that basically make me discard the idea of using CTE unless I know I will benefit from it.

Comments are closed.