How to get list of elements from multiranges?

So, some time ago, Pg devs added multi ranges – that is datatype that can be used to store multiple ranges in single column.

The thing is that it wasn't really simple how to get list of ranges from within such multirange. There was no operator, no way to split it.

A month ago Alexander Korotkov committed patch that added unnest() over multiranges, but it got some problems, and was reverted

It will eventually made it's way into sources, I assume, but in the mean time – a friend of mine asked how to get list of elements from multiranges. So I looked into it.

With newest Pg 15, we have following datatypes defined:

$ \dTS *multirange
                                              List OF DATA types
   Schema   │          Name           │                              Description                              
────────────┼─────────────────────────┼───────────────────────────────────────────────────────────────────────
 pg_catalog │ anycompatiblemultirange │ pseudo-TYPE representing a multirange OVER a polymorphic common TYPE
 pg_catalog │ anymultirange           │ pseudo-TYPE representing a polymorphic base TYPE that IS a multirange
 pg_catalog │ datemultirange          │ multirange OF dates
 pg_catalog │ int4multirange          │ multirange OF integers
 pg_catalog │ int8multirange          │ multirange OF bigints
 pg_catalog │ nummultirange           │ multirange OF numerics
 pg_catalog │ tsmultirange            │ multirange OF timestamps WITHOUT TIME zone
 pg_catalog │ tstzmultirange          │ multirange OF timestamps WITH TIME zone
(8 ROWS)

First two are irrelevant. So we're realistically have to figure out multiranges for:

  • date
  • integers
  • numerics
  • timestamps with or without timezone

The thing is that none of these can contain characters like [, ], (, or ) – which are used to denote end of ranges. So, while it's ugly, and will be obsolete the second that we'll get unnest(), we can split it using regexps.

Each range within the multirange will contain:

  1. [ or ( character
  2. Any number of characters, with the exception of ] and )
  3. ] or ) character

That's simple to write, but regexp for this will be well, unpleasant. Since character ranges in regexps are written within brackets, we will have to do some (a lot of) escaping. So the parts that I listed above will become:

  1. [\[(]
  2. [^\])]+
  3. [\])]

I can test it now with simple test case:

$ WITH mr AS (
    SELECT '[0,100]'::int4range::int4multirange - '[30,40)'::int4range::int4multirange AS v
)
SELECT x
FROM mr, regexp_matches(mr.v::text, '[\[(][^\])]+[\])]', 'g') x;
      x       
──────────────
 {"[0,30)"}
 {"[40,101)"}
(2 ROWS)

Nice-ish. Now, I just need to extract first element from each array that regexp_matches returned, and cast back to int4range:

$ WITH mr AS (
    SELECT '[0,100]'::int4range::int4multirange - '[30,40)'::int4range::int4multirange AS v
)
SELECT x[1]::int4range
FROM mr, regexp_matches(mr.v::text, '[\[(][^\])]+[\])]', 'g') x;
    x     
──────────
 [0,30)
 [40,101)
(2 ROWS)

Well, this isn't really beautiful. We could wrap it in a function, but it would have to return texts, and not ranges, and I simply don't know if it's possible in sensible way.

And – this is just temporary, until unpack() will be re-submitted. So, I guess it's OK for now. We can write the ugly regexp in the function, though:

$ CREATE FUNCTION unnest_multirange(anymultirange) RETURNS setof text AS $$
    SELECT x[1] FROM regexp_matches($1::text, '[\[(][^\])]+[\])]', 'g') x;
$$ LANGUAGE SQL;

and with this I can simply:

$ SELECT u::int4range FROM unnest_multirange('{[0,30),[40,101)}'::int4multirange) AS u;
    u     
──────────
 [0,30)
 [40,101)
(2 ROWS)

I guess it has to do, for now 🙂

2 thoughts on “How to get list of elements from multiranges?”

Comments are closed.