regexp – select * from depesz;

Human/version sort in PostgreSQL

Ever been in situation where you had to sort data that is partially text, and partially numerical?

Like invoice numbers: prefix-9, prefix-10, prefix-11, other-5, other-20 ? Normally you can't do order by as you will get them in wrong order: other-20 ⇒ other-5 ⇒ prefix-10 ⇒ prefix-11 ⇒ prefix-9. Can something be done with it?

Continue reading Human/version sort in PostgreSQL

Waiting for PostgreSQL 16 – Add support for regexps on database and user entries in pg_hba.conf

On 24th of October 2022, Michael Paquier committed patch:

Add support for regexps on database and user entries in pg_hba.conf
 
As of this commit, any database or user entry beginning with a slash (/)
is considered as a regular expression.  This is particularly useful for
users, as now there is no clean way to match pattern on multiple HBA
lines.  For example, a user name mapping with a regular expression needs
first to match with a HBA line, and we would skip the follow-up HBA
entries if the ident regexp does *not* match with what has matched in
the HBA line.
 
pg_hba.conf is able to handle multiple databases and roles with a
comma-separated list of these, hence individual regular expressions that
include commas need to be double-quoted.
 
At authentication time, user and database names are now checked in the
following order:
- Arbitrary keywords (like "all", the ones beginning by '+' for
membership check), that we know will never have a regexp.  A fancy case
is for physical WAL senders, we *have* to only match "replication" for
the database.
- Regular expression matching.
- Exact match.
The previous logic did the same, but without the regexp step.
 
We have discussed as well the possibility to support regexp pattern
matching for host names, but these happen to lead to tricky issues based
on what I understand, particularly with host entries that have CIDRs.
 
This commit relies heavily on the refactoring done in a903971 and
fc579e1, so as the amount of code required to compile and execute
regular expressions is now minimal.  When parsing pg_hba.conf, all the
computed regexps needs to explicitely free()'d, same as pg_ident.conf.
 
Documentation and TAP tests are added to cover this feature, including
cases where the regexps use commas (for clarity in the docs, coverage
for the parsing logic in the tests).
 
Note that this introduces a breakage with older versions, where a
database or user name beginning with a slash are treated as something to
check for an equal match.  Per discussion, we have discarded this as
being much of an issue in practice as it would require a cluster to
have database and/or role names that begin with a slash, as well as HBA
entries using these.  Hence, the consistency gained with regexps in
pg_ident.conf is more appealing in the long term.
 
**This compatibility change should be mentioned in the release notes.**
 
Author: Bertrand Drouvot
Reviewed-by: Jacob Champion, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/fff0d7c1-8ad4-76a1-9db3-0ab6ec338bf7@amazon.com

Continue reading Waiting for PostgreSQL 16 – Add support for regexps on database and user entries in pg_hba.conf

Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

On 3rd of August 2021, Tom Lane committed patch:

Add assorted new regexp_xxx SQL functions.
 
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.
 
Gilles Darold, heavily revised by me
 
Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net

Continue reading Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

How to get list of elements from multiranges?

So, some time ago, Pg devs added multi ranges – that is datatype that can be used to store multiple ranges in single column.

The thing is that it wasn't really simple how to get list of ranges from within such multirange. There was no operator, no way to split it.

A month ago Alexander Korotkov committed patch that added unnest() over multiranges, but it got some problems, and was reverted

It will eventually made it's way into sources, I assume, but in the mean time – a friend of mine asked how to get list of elements from multiranges. So I looked into it.

Continue reading How to get list of elements from multiranges?

Waiting for 9.6 – Implement lookbehind constraints in our regular-expression engine.

On 30th of October, Tom Lane committed patch:

Implement lookbehind constraints in our regular-expression engine.
 
A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point.  This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.
 
Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked.  In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall.  To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops.  This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space.  So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.
 
I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND
constraints, which already existed in the engine.  This avoids the overhead
of the LACON machinery entirely for these rather common cases.
 
The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely.  So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares.  (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)

Continue reading Waiting for 9.6 – Implement lookbehind constraints in our regular-expression engine.

Waiting for 9.3 – Support indexing of regular-expression searches in contrib/pg_trgm.

On 9th of April, Tom Lane committed patch:

Support indexing of regular-expression searches in contrib/pg_trgm.
 
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.
 
Currently, only GIN indexes are supported.  We might be able to make
it work with GiST indexes later.
 
The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.
 
Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane

One day later Tom Lane added support for the same operations using GiST indexes (original patch was working only with GIN).

Continue reading Waiting for 9.3 – Support indexing of regular-expression searches in contrib/pg_trgm.

=$
|

Tag: regexp