Waiting for PostgreSQL 16 – Add support for regexps on database and user entries in pg_hba.conf

On 24th of October 2022, Michael Paquier committed patch:

Add support for regexps on database and user entries in pg_hba.conf
 
As of this commit, any database or user entry beginning with a slash (/)
is considered as a regular expression.  This is particularly useful for
users, as now there is no clean way to match pattern on multiple HBA
lines.  For example, a user name mapping with a regular expression needs
first to match with a HBA line, and we would skip the follow-up HBA
entries if the ident regexp does *not* match with what has matched in
the HBA line.
 
pg_hba.conf is able to handle multiple databases and roles with a
comma-separated list of these, hence individual regular expressions that
include commas need to be double-quoted.
 
At authentication time, user and database names are now checked in the
following order:
- Arbitrary keywords (like "all", the ones beginning by '+' for
membership check), that we know will never have a regexp.  A fancy case
is for physical WAL senders, we *have* to only match "replication" for
the database.
- Regular expression matching.
- Exact match.
The previous logic did the same, but without the regexp step.
 
We have discussed as well the possibility to support regexp pattern
matching for host names, but these happen to lead to tricky issues based
on what I understand, particularly with host entries that have CIDRs.
 
This commit relies heavily on the refactoring done in a903971 and
fc579e1, so as the amount of code required to compile and execute
regular expressions is now minimal.  When parsing pg_hba.conf, all the
computed regexps needs to explicitely free()'d, same as pg_ident.conf.
 
Documentation and TAP tests are added to cover this feature, including
cases where the regexps use commas (for clarity in the docs, coverage
for the parsing logic in the tests).
 
Note that this introduces a breakage with older versions, where a
database or user name beginning with a slash are treated as something to
check for an equal match.  Per discussion, we have discarded this as
being much of an issue in practice as it would require a cluster to
have database and/or role names that begin with a slash, as well as HBA
entries using these.  Hence, the consistency gained with regexps in
pg_ident.conf is more appealing in the long term.
 
**This compatibility change should be mentioned in the release notes.**
 
Author: Bertrand Drouvot
Reviewed-by: Jacob Champion, Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/fff0d7c1-8ad4-76a1-9db3-0ab6ec338bf7@amazon.com

The commit message basically explains everything, but let me just quickly talk it through and show some examples.

pg_hba.conf file is used to decide who can connect to which database, from where, and using what authentication method. I wrote about it in more details earlier while I was explaining how to fix some issue.

Now, the format of the file is simple. There are two types of config lines:

  • local DATABASE USER METHOD [OPTIONS]
  • host* DATABASE USER ADDRESS METHOD [OPTIONS]

This new change modifies what can be but in DATABASE and USER fields.

To recap, in DATABASE you could have any number of database names, separated by , (comma).

So, for example you could have line line:

local db1,db2,postgres all trust

which would allow anyone to connect to one of the 3 listed databases without any kind of verification of who is who.

In case of USER field, you had more options:

  • username
  • group name (role) prefixed with + sign
  • name of file that contains list of username, prefixed with @

and, of course, you can have any number of these elements, separated by comma.

So, for example:

local db1 depesz,postgres,+dbas,@/etc/postgres/cron.users.lst trust

would allow connection to db1 by users depesz, postgres, any user belonging to group dbas, and any user listed in /etc/postgres/cron.users.lst file.

Now, in both of these fields (DATABASE and USER), we can add regexps, by prefixing them with / (slash).

How would that look? For example like this:

local /^db[0-9]+$,/^depesz depesz trust

would let me connect to two groups of databases:

  • ^db[0-9]+$ – any db that has name starting with “db" and then there are some (at least 1) digits. And then end of string. So, for example: db1, db13123, but not db23b
  • ^depesz – any db that has name starting with depesz

Similarly I could use it in USER field, for example like this:

host all /-rw$ 10.1.2.0/24 scram-sha-256
host all /-ro$ 10.0.0.0/8  scram-sha-256

This would allow any user with name ending with -rw to login, as long as they connect from ip being one of 10.1.2.*, and any user with name ending with -ro from ip that is one of 10.*.*.*.

I think it looks great, and will allow nicer configuration in some cases. Well, as long as you don't mind regexps. But if you'll not go overboard with complexity, it should be really, really helpful thing.

As always, thanks to all involved.