Converting list of integers into list of ranges

Yesterday someone on irc asked:

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

First, it would be good to have test data. To avoid retyping it for every query let's make a table with some integers:

=$ CREATE TABLE test (i int4);
CREATE TABLE
 
=$ INSERT INTO test (i) VALUES (1), (2), (5), (6), (7), (8), (9), (10), (20), (30);
INSERT 0 10

This gives us nice table input data:

=$ SELECT i FROM test ORDER BY i;
 i
----
  1
  2
  5
  6
  7
  8
  9
 10
 20
 30
(10 ROWS)

Based on data above, we'd like to get ranges:

1 to 2
5 to 10
20
30

First, we need a way to tell if given row starts new range. This can be trivially done by checking whether previous value of i smaller by 1:

=$ SELECT
    i,
    i - lag(i) OVER (ORDER BY i) AS diff
FROM test
ORDER BY i;
 i  |  diff
----+--------
  1 | [NULL]
  2 |      1
  5 |      3
  6 |      1
  7 |      1
  8 |      1
  9 |      1
 10 |      1
 20 |     10
 30 |     10
(10 ROWS)

Nice. If diff is null or > 1 then current i starts new range.

Now, let's make it so that all rows in given range will have the same group id. To do so, I'll assign group_id (being just copy of value from i) for first rows in range, and in other cases, I'll make it null:

=$ WITH with_group_id AS ( SELECT i, CASE WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL ELSE i END AS group_id FROM test ) SELECT * FROM with_group_id ORDER BY i; i | group_id ----+---------- 1 | 1 2 | [NULL] 5 | 5 6 | [NULL] 7 | [NULL] 8 | [NULL] 9 | [NULL] 10 | [NULL] 20 | 20 30 | 30 (10 ROWS)

Great. Now, let's fill group_id for non-first rows:

=$ WITH with_group_id AS ( SELECT i, CASE WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL ELSE i END AS group_id FROM test ) SELECT i, MAX(group_id) OVER (ORDER BY i) AS group_id FROM with_group_id ORDER BY i; i | group_id ----+---------- 1 | 1 2 | 1 5 | 5 6 | 5 7 | 5 8 | 5 9 | 5 10 | 5 20 | 20 30 | 30 (10 ROWS)

Great. Now, all I need is to get ranges using normal grouping:

=$ WITH with_group_id AS ( SELECT i, CASE WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL ELSE i END AS group_id FROM test ), full_group_id AS ( SELECT i, MAX(group_id) OVER (ORDER BY i) AS group_id FROM with_group_id ) SELECT MIN(i) AS range_from, MAX(i) AS range_to FROM full_group_id GROUP BY group_id ORDER BY group_id; range_from | range_to ------------+---------- 1 | 2 5 | 10 20 | 20 30 | 30 (4 ROWS)

This can be also used to generate proper PostgreSQL ranges:

=$ WITH with_group_id AS ( SELECT i, CASE WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL ELSE i END AS group_id FROM test ), full_group_id AS ( SELECT i, MAX(group_id) OVER (ORDER BY i) AS group_id FROM with_group_id ) SELECT int4range( MIN(i), MAX(i), '[]' ) AS range FROM full_group_id GROUP BY group_id ORDER BY group_id; range --------- [1,3) [5,11) [20,21) [30,31) (4 ROWS)

Pretty cool, isn't it?

5 thoughts on “Converting list of integers into list of ranges”

Long time no write.

I had to solve exactly this problem in a work related task. At the time, the approach I took was direct SQL, no CTEs at all. I must admit the CTEs definitely make it easier to explain the solution here.

The approach I took was (using your table):

SELECT
  int4range(
    i,
    CASE WHEN LAST THEN i ELSE j END,
    '[]'
  ) AS range
FROM (
  SELECT
    i,
    FIRST,
    LAST,
    LEAD(i) OVER (ORDER BY i) AS j
  FROM (
    SELECT
      i,
      COALESCE(i - LAG(i) OVER(ORDER BY i), 0)  1 AS FIRST,
      COALESCE(LEAD(i) OVER (ORDER BY i) - i, 0)  1 AS LAST
    FROM test
  ) a
  WHERE FIRST OR LAST
) b
WHERE FIRST
ORDER BY i;

This has the nice side effect of reducing the amount of work to be done when there are large contiguous ranges.

I did a similar thing a while back, but for dates. The var `dates` was a `date[]` for this code:

SELECT  daterange(MIN(d), MAX(d)+1) AS date_ranges
FROM    (   SELECT d, SUM(adj_num) OVER(ORDER BY d) AS range_group
            FROM (  SELECT  e.d,
                            CASE
                                WHEN lag(d) OVER(ORDER BY d) = d-1 THEN 0
                                ELSE 1
                            END AS adj_num
                    FROM (  SELECT  DISTINCT d.d
                            FROM    unnest(dates) AS d(d)
                            ) e
                    ) f
            ) g
GROUP BY g.range_group;

I believe that this solution is simplest (sorry, no idea where you are getting your nice formatting):

SELECT MIN(i) AS range_from, MAX(i) AS range_to
FROM
(SELECT i, i-ROW_NUMBER() OVER (ORDER BY i) AS group_num FROM test ORDER BY i) AS t
GROUP BY group_num
ORDER BY group_num

@Dan:

This is beautiful. Abuses certain facts about data, but works, and is much simpler. Thanks.

Thanks. 🙂
As I look at it again, it appears that I really could have dropped the second “ORDER BY i” inside the subselect. After all, the aggregate functions don’t care what order the rows of t appear in. I probably put it there to aid with verifying the approach, as I was working on it. So, if I am correct, it could be made slightly simpler, still.

Comments are closed.