Waiting for PostgreSQL 14 – Multirange datatypes

On 20th of December 2020, Alexander Korotkov committed patch:

Multirange datatypes
 
Multiranges are basically sorted arrays of non-overlapping ranges with
set-theoretic operations defined over them.
 
Since v14, each range type automatically gets a corresponding multirange
datatype.  There are both manual and automatic mechanisms for naming multirange
types.  Once can specify multirange type name using multirange_type_name
attribute in CREATE TYPE.  Otherwise, a multirange type name is generated
automatically.  If the range type name contains "range" then we change that to
"multirange".  Otherwise, we add "_multirange" to the end.
 
Implementation of multiranges comes with a space-efficient internal
representation format, which evades extra paddings and duplicated storage of
oids.  Altogether this format allows fetching a particular range by its index
in O(n).
 
Statistic gathering and selectivity estimation are implemented for multiranges.
For this purpose, stored multirange is approximated as union range without gaps.
This field will likely need improvements in the future.
 
Catversion is bumped.
 
Discussion: https://postgr.es/m/CALNJ-vSUpQ_Y%3DjXvTxt1VYFztaBSsWVXeF1y6gTYQ4bOiWDLgQ%40mail.gmail.com
Discussion: https://postgr.es/m/a0b8026459d1e6167933be2104a6174e7d40d0ab.camel%40j-davis.com#fe7218c83b08068bfffb0c5293eceda0
Author: Paul Jungwirth, revised by me
Reviewed-by: David Fetter, Corey Huinker, Jeff Davis, Pavel Stehule
Reviewed-by: Alvaro Herrera, Tom Lane, Isaac Morland, David G. Johnston
Reviewed-by: Zhihong Yu, Alexander Korotkov

Ever since Pg 9.2 we had range datatypes.

Just in case you're not familiar with them – ranges are, like name suggests, for storing range of values, with optional end included, or not.

Example:

  • ‘[1,10)'::int4range is range that contains all int4 values, starting (and including) from 1, and ending (but without) at 10.
  • ‘(1,2)'::numrange contains all numbers from 1 to 2, but without either end.

There are more range datatypes, based on base datatypes like: date, timestamp, timestamp with time zone.

Anyway – this new commits adds a way to have multiple ranges (as long as they are not overlapping) in the same value, while still retaining usability.

Let's see how that looks:

SELECT '{[1,10), [15,20)}'::int4multirange
  int4multirange
------------------
 {[1,10),[15,20)}
(1 ROW)

So basically, the syntax, when giving as string literal, is: ranges, separated by comma, within curly braces. Simple. Of course you can also build the value using functional constructors:

SELECT nummultirange(
    numrange( 1, 2, '()' ),
    numrange( 5, 20, '[)' ),
    numrange( 30, 40, '(]' ),
    numrange( 50, 60, '[]' )
);
         nummultirange
--------------------------------
 {(1,2),[5,20),(30,40],[50,60]}
(1 ROW)

Let's see if indexes still work on it. First the table:

CREATE TABLE test (
    id int8 generated always AS IDENTITY,
    ranges nummultirange,
    PRIMARY KEY (id)
);

To generate random ranges I will use a helper function:

CREATE OR REPLACE FUNCTION random_range( IN range_base NUMERIC ) RETURNS numrange AS $$
DECLARE
    rnd NUMERIC := random() * 100 + range_base;
BEGIN
    RETURN numrange( rnd, rnd + 100, (array['()', '(]', '[)', '[]'])[1 + FLOOR(random() * 4)] );
END;
$$ LANGUAGE plpgsql;

This generates ranges like:

SELECT random_range( 0 ), random_range(1000), random_range(1000000);
             random_range              |            random_range             |            random_range
---------------------------------------+-------------------------------------+-------------------------------------
 (6.73893602286739,106.73893602286739) | [1018.09286781309,1118.09286781309] | [1000012.53933706,1000112.53933706]
(1 ROW)

So now, I can add some rows to the test table:

INSERT INTO test (ranges)
    SELECT nummultirange( random_range( 0 ), random_range(1000), random_range(1000000) )
    FROM generate_series(1,10000);

Now, sanity check:

$ SELECT * FROM test WHERE ranges @> 199.95::NUMERIC;
  id  │                                                     ranges
──────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 5836{(99.9590845097934,199.9590845097934],(1016.62402020579,1116.62402020579),(1000011.36903192,1000111.36903192]}
 6357{(99.9867100324749,199.9867100324749],[1097.88289071447,1197.88289071447],[1000040.3460164,1000140.3460164]}
 6779{[99.9503790769609,199.9503790769609],(1089.78186859559,1189.78186859559],(1000098.76864875,1000198.76864875]}
(3 ROWS)
 
$ SELECT * FROM test WHERE ranges @> 1199.95::NUMERIC;
  id  │                                                     ranges
──────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  114{(10.4067391772681,110.4067391772681],(1099.95856255563,1199.95856255563),[1000046.5590515,1000146.5590515)}
  417{(33.1140858072345,133.1140858072345],[1099.9728818079,1199.9728818079],(1000016.08665261,1000116.08665261)}
 5195{[51.2302742944712,151.2302742944712],(1099.96250046552,1199.96250046552],(1000057.72533314,1000157.72533314)}
 5319{[40.4233953914996,140.4233953914996],[1099.96352725942,1199.96352725942),(1000011.21480186,1000111.21480186]}
 8628{(14.6304406764152,114.6304406764152),[1099.95173889406,1199.95173889406),(1000050.55915511,1000150.55915511]}
(5 ROWS)
 
$ SELECT * FROM test WHERE ranges @> 1000199.95::NUMERIC;
  id  │                                                     ranges
──────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  240{[77.7909859996235,177.7909859996235],(1035.84122266822,1135.84122266822],(1000099.99954803,1000199.99954803]}
 1320{[54.305165667008,154.305165667008],(1020.19294394367,1120.19294394367],(1000099.98575635,1000199.98575635)}
 1345{(88.9084322871,188.9084322871],[1084.58570354545,1184.58570354545),(1000099.99855133,1000199.99855133]}
 1888{(59.8240736155354,159.8240736155354],[1024.8460349532,1124.8460349532),[1000099.95610725,1000199.95610725)}
 4075{(39.288448554089,139.288448554089],[1047.13115870979,1147.13115870979),[1000099.9927613,1000199.9927613)}
 8052{[97.2211727169046,197.2211727169046),(1029.59738006384,1129.59738006384),(1000099.96032482,1000199.96032482]}
(6 ROWS)

All is well.

Unfortunately, at the time of writing this blogpost, I couldn't make gin/gist indexes on multiranges, but I would assume these will happen eventually. Or maybe I was just doing something wrong.

I mean – I can, sure, create btree index, and use it for queries like:

SELECT * FROM test
WHERE ranges = '{[77.7909859996235,177.7909859996235],(1035.84122266822,1135.84122266822],(1000099.99954803,1000199.99954803]}';

but the cooler operators (like @>) are not indexable, yet.

It's a really nice addition, thanks to all involved.

7 thoughts on “Waiting for PostgreSQL 14 – Multirange datatypes”

  1. Hubert, thank you for your post.
    Committed work implements basic infrastructure for multiranges. Index support will be added later. We still have time before v14 feature freeze. So, I think it’s feasible to commit something in this fields.

    The first step is to make range indexes support matching with multiranges. That is support queries like “range_col opr multirange_const”.

    The second step is to index multiranges themselves. In the first implementation, we can approximate multiranges as union ranges with no gaps. That would allow to re-use existing indexing schemes for ranges. Ideally, we need to teach gist/spgist to have multiple index tuples per heap tuple (and become gin-like). But that requires much more work and definitely not a subject for v14.

  2. @Alexander:

    Thanks for info. And it’s good to know that index support in some way is on its way 🙂

  3. Great read, my application actually could make use of this feature once available. Totally missed this thread in -hackers, so glad I caught it here. Also nice to see the comments on things to come!

  4. Hello, thanks for the post. SQL noob here. What are some common use cases of multiranges?

  5. @Nick:

    Given that this is new feature that is *NOT* yet available in any released version of PostgreSQL – there are no common usecases.

    I would assume that is will have similar-ish usecases as arrays have now – when you want to define multiple “somethings” but don’t necessarily want to have sub-table for it.

  6. Just as a show of support, this has so much potential in my particular use case. I deal intensively with FCC spectrum licensing. All licenses have a spectral component (the range of frequencies granted to the licensee) as well as a geospatial component (the geographic area where those frequencies are granted). There are so many ways in which we analytically use ranges that I have resorted to basically writing my own PL/PgSQL suite of functions to be able to treat postgres arrays of numranges as multiranges. Even with this support, it is extremely tedious and error prone to support even simple use cases, and in particular, things like aggregating ranges into multiranges (via array_agg) is difficult, requiring the constant vigilance to keep the ranges ordered within the aggration.

    I’m not much of a C programmer, but if there is any way that I can help contribute to rollout of this feature (via function implementations, etc.), I’d be glad to do so.

Comments are closed.