# Converting list of integers into list of ranges

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

First, it would be good to have test data. To avoid retyping it for every query let's make a table with some integers:

```=\$ CREATE TABLE test (i int4);
CREATE TABLE

=\$ INSERT INTO test (i) VALUES (1), (2), (5), (6), (7), (8), (9), (10), (20), (30);
INSERT 0 10```

This gives us nice table input data:

```=\$ SELECT i FROM test ORDER BY i;
i
----
1
2
5
6
7
8
9
10
20
30
(10 ROWS)```

Based on data above, we'd like to get ranges:

• 1 to 2
• 5 to 10
• 20
• 30

First, we need a way to tell if given row starts new range. This can be trivially done by checking whether previous value of i smaller by 1:

```=\$ SELECT
i,
i - lag(i) OVER (ORDER BY i) AS diff
FROM test
ORDER BY i;
i  |  diff
----+--------
1 | [NULL]
2 |      1
5 |      3
6 |      1
7 |      1
8 |      1
9 |      1
10 |      1
20 |     10
30 |     10
(10 ROWS)```

Nice. If diff is null or > 1 then current i starts new range.

Now, let's make it so that all rows in given range will have the same group id. To do so, I'll assign group_id (being just copy of value from i) for first rows in range, and in other cases, I'll make it null:

```=\$ WITH with_group_id AS (
SELECT
i,
CASE
WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
ELSE i
END AS group_id
FROM test
)
SELECT * FROM with_group_id
ORDER BY i;
i  | group_id
----+----------
1 |        1
2 |   [NULL]
5 |        5
6 |   [NULL]
7 |   [NULL]
8 |   [NULL]
9 |   [NULL]
10 |   [NULL]
20 |       20
30 |       30
(10 ROWS)```

Great. Now, let's fill group_id for non-first rows:

```=\$ WITH with_group_id AS (
SELECT
i,
CASE
WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
ELSE i
END AS group_id
FROM test
)
SELECT
i,
MAX(group_id) OVER (ORDER BY i) AS group_id
FROM with_group_id
ORDER BY i;
i  | group_id
----+----------
1 |        1
2 |        1
5 |        5
6 |        5
7 |        5
8 |        5
9 |        5
10 |        5
20 |       20
30 |       30
(10 ROWS)```

Great. Now, all I need is to get ranges using normal grouping:

```=\$ WITH with_group_id AS (
SELECT
i,
CASE
WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
ELSE i
END AS group_id
FROM test
), full_group_id AS (
SELECT
i,
MAX(group_id) OVER (ORDER BY i) AS group_id
FROM with_group_id
)
SELECT
MIN(i) AS range_from,
MAX(i) AS range_to
FROM
full_group_id
GROUP BY group_id
ORDER BY group_id;
range_from | range_to
------------+----------
1 |        2
5 |       10
20 |       20
30 |       30
(4 ROWS)```

This can be also used to generate proper PostgreSQL ranges:

```=\$ WITH with_group_id AS (
SELECT
i,
CASE
WHEN i - lag(i) OVER (ORDER BY 1) < 2 THEN NULL
ELSE i
END AS group_id
FROM test
), full_group_id AS (
SELECT
i,
MAX(group_id) OVER (ORDER BY i) AS group_id
FROM with_group_id
)
SELECT
int4range(
MIN(i),
MAX(i),
'[]'
) AS range
FROM
full_group_id
GROUP BY group_id
ORDER BY group_id;
range
---------
[1,3)
[5,11)
[20,21)
[30,31)
(4 ROWS)```

Pretty cool, isn't it?

## 5 thoughts on “Converting list of integers into list of ranges”

1. Mark Guyatt says:

Long time no write.

I had to solve exactly this problem in a work related task. At the time, the approach I took was direct SQL, no CTEs at all. I must admit the CTEs definitely make it easier to explain the solution here.

The approach I took was (using your table):

```SELECT
int4range(
i,
CASE WHEN LAST THEN i ELSE j END,
'[]'
) AS range
FROM (
SELECT
i,
FIRST,
LAST,
LEAD(i) OVER (ORDER BY i) AS j
FROM (
SELECT
i,
COALESCE(i - LAG(i) OVER(ORDER BY i), 0)  1 AS FIRST,
COALESCE(LEAD(i) OVER (ORDER BY i) - i, 0)  1 AS LAST
FROM test
) a
WHERE FIRST OR LAST
) b
WHERE FIRST
ORDER BY i;```

This has the nice side effect of reducing the amount of work to be done when there are large contiguous ranges.

2. Corey says:

I did a similar thing a while back, but for dates. The var `dates` was a `date[]` for this code:

```SELECT  daterange(MIN(d), MAX(d)+1) AS date_ranges
FROM    (   SELECT d, SUM(adj_num) OVER(ORDER BY d) AS range_group
FROM (  SELECT  e.d,
CASE
WHEN lag(d) OVER(ORDER BY d) = d-1 THEN 0
ELSE 1
FROM (  SELECT  DISTINCT d.d
FROM    unnest(dates) AS d(d)
) e
) f
) g
GROUP BY g.range_group;```
3. Dan Scott says:

I believe that this solution is simplest (sorry, no idea where you are getting your nice formatting):

```SELECT MIN(i) AS range_from, MAX(i) AS range_to
FROM
(SELECT i, i-ROW_NUMBER() OVER (ORDER BY i) AS group_num FROM test ORDER BY i) AS t
GROUP BY group_num
ORDER BY group_num```
4. depesz says:

@Dan:

This is beautiful. Abuses certain facts about data, but works, and is much simpler. Thanks.

5. Dan Scott says:

Thanks. 🙂
As I look at it again, it appears that I really could have dropped the second “ORDER BY i” inside the subselect. After all, the aggregate functions don’t care what order the rows of t appear in. I probably put it there to aid with verifying the approach, as I was working on it. So, if I am correct, it could be made slightly simpler, still.