Converting list of integers into list of ranges

Yesterday someone on irc asked:

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

First, it would be good to have test data. To avoid retyping it for every query let's make a table with some integers:

=$ CREATE TABLE test (i int4);
CREATE TABLE
 
=$ INSERT INTO test (i) VALUES (1), (2), (5), (6), (7), (8), (9), (10), (20), (30);
INSERT 0 10

This gives us nice table input data:

=$ SELECT i FROM test ORDER BY i;
 i
----
  1
  2
  5
  6
  7
  8
  9
 10
 20
 30
(10 ROWS)

Based on data above, we'd like to get ranges:

  • 1 to 2
  • 5 to 10
  • 20
  • 30

First, we need a way to tell if given row starts new range. This can be trivially done by checking whether previous value of i smaller by 1:

=$ SELECT
    i,
    i - lag(i) OVER (ORDER BY i) AS diff
FROM test
ORDER BY i;
 i  |  diff
----+--------
  1 | [NULL]
  2 |      1
  5 |      3
  6 |      1
  7 |      1
  8 |      1
  9 |      1
 10 |      1
 20 |     10
 30 |     10
(10 ROWS)

Nice. If diff is null or > 1 then current i starts new range.

Now, let's make it so that all rows in given range will have the same group id. To do so, I'll assign group_id (being just copy of value from i) for first rows in range, and in other cases, I'll make it null:

=$ WITH with_group_id AS (
    SELECT
        i,
        CASE
            WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
            ELSE i
        END AS group_id
    FROM test
)
SELECT * FROM with_group_id
ORDER BY i;
 i  | group_id
----+----------
  1 |        1
  2 |   [NULL]
  5 |        5
  6 |   [NULL]
  7 |   [NULL]
  8 |   [NULL]
  9 |   [NULL]
 10 |   [NULL]
 20 |       20
 30 |       30
(10 ROWS)

Great. Now, let's fill group_id for non-first rows:

=$ WITH with_group_id AS (
    SELECT
        i,
        CASE
            WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
            ELSE i
        END AS group_id
    FROM test
)
SELECT
     i,
     MAX(group_id) OVER (ORDER BY i) AS group_id
FROM with_group_id
ORDER BY i;
 i  | group_id
----+----------
  1 |        1
  2 |        1
  5 |        5
  6 |        5
  7 |        5
  8 |        5
  9 |        5
 10 |        5
 20 |       20
 30 |       30
(10 ROWS)

Great. Now, all I need is to get ranges using normal grouping:

=$ WITH with_group_id AS (
    SELECT
        i,
        CASE
            WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
            ELSE i
        END AS group_id
    FROM test
), full_group_id AS (
    SELECT
        i,
        MAX(group_id) OVER (ORDER BY i) AS group_id
    FROM with_group_id
)
SELECT
    MIN(i) AS range_from,
    MAX(i) AS range_to
FROM
    full_group_id
GROUP BY group_id
ORDER BY group_id;
 range_from | range_to
------------+----------
          1 |        2
          5 |       10
         20 |       20
         30 |       30
(4 ROWS)

This can be also used to generate proper PostgreSQL ranges:

=$ WITH with_group_id AS (
    SELECT
        i,
        CASE
            WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
            ELSE i
        END AS group_id
    FROM test
), full_group_id AS (
    SELECT
        i,
        MAX(group_id) OVER (ORDER BY i) AS group_id
    FROM with_group_id
)
SELECT
    int4range(
        MIN(i),
        MAX(i),
        '[]'
    ) AS range
FROM
    full_group_id
GROUP BY group_id
ORDER BY group_id;
  range
---------
 [1,3)
 [5,11)
 [20,21)
 [30,31)
(4 ROWS)

Pretty cool, isn't it?

5 thoughts on “Converting list of integers into list of ranges”

  1. Long time no write.

    I had to solve exactly this problem in a work related task. At the time, the approach I took was direct SQL, no CTEs at all. I must admit the CTEs definitely make it easier to explain the solution here.

    The approach I took was (using your table):

    SELECT
      int4range(
        i,
        CASE WHEN LAST THEN i ELSE j END,
        '[]'
      ) AS range
    FROM (
      SELECT
        i,
        FIRST,
        LAST,
        LEAD(i) OVER (ORDER BY i) AS j
      FROM (
        SELECT
          i,
          COALESCE(i - LAG(i) OVER(ORDER BY i), 0)  1 AS FIRST,
          COALESCE(LEAD(i) OVER (ORDER BY i) - i, 0)  1 AS LAST
        FROM test
      ) a
      WHERE FIRST OR LAST
    ) b
    WHERE FIRST
    ORDER BY i;

    This has the nice side effect of reducing the amount of work to be done when there are large contiguous ranges.

  2. I did a similar thing a while back, but for dates. The var `dates` was a `date[]` for this code:

    SELECT  daterange(MIN(d), MAX(d)+1) AS date_ranges
    FROM    (   SELECT d, SUM(adj_num) OVER(ORDER BY d) AS range_group
                FROM (  SELECT  e.d,
                                CASE
                                    WHEN lag(d) OVER(ORDER BY d) = d-1 THEN 0
                                    ELSE 1
                                END AS adj_num
                        FROM (  SELECT  DISTINCT d.d
                                FROM    unnest(dates) AS d(d)
                                ) e
                        ) f
                ) g
    GROUP BY g.range_group;
  3. I believe that this solution is simplest (sorry, no idea where you are getting your nice formatting):

    SELECT MIN(i) AS range_from, MAX(i) AS range_to
    FROM
    (SELECT i, i-ROW_NUMBER() OVER (ORDER BY i) AS group_num FROM test ORDER BY i) AS t
    GROUP BY group_num
    ORDER BY group_num
  4. Thanks. 🙂
    As I look at it again, it appears that I really could have dropped the second “ORDER BY i” inside the subselect. After all, the aggregate functions don’t care what order the rows of t appear in. I probably put it there to aid with verifying the approach, as I was working on it. So, if I am correct, it could be made slightly simpler, still.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.