Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

On 3rd of August 2021, Tom Lane committed patch:

Add assorted new regexp_xxx SQL functions.
 
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.
 
Gilles Darold, heavily revised by me
 
Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net

Continue reading Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

Does varchar(n) use less disk space than varchar() or text?

Some time ago on Slack some person said:

varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT

There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that:

VARChar takes much less ‘place' than TEXT … but have to face it to believe it

and when I asked for

Show me db fiddle with queries that show this difference and I'll believe

I got:

HA ha ! i let you dig it  i've some work ….

I reacted (over reacted), and the thread died.

I know (knew?) that varchar/text are the same internally (wrote about it), but perhaps there have been recent change?

Continue reading Does varchar(n) use less disk space than varchar() or text?

Waiting for PostgreSQL 15 – Allow publishing the tables of schema.

On 27th of October 2021, Amit Kapila committed patch:

Allow publishing the tables of schema.
 
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.
 
The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
 
A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.
 
Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.
 
Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com

Continue reading Waiting for PostgreSQL 15 – Allow publishing the tables of schema.

Why is it hard to automatically suggest what index to create?

Every now and then someone asks me to add index suggestions to explain.depesz.com.

I always respond with polite decline. This is complicated thing to do, and I just don't have that time.

Lately I was asked, on Slack to add (to explain.depesz.com) link to pganalyze Index Advisor for Postgres.

So I checked it out. And results prompted me to write this blogpost.

Continue reading Why is it hard to automatically suggest what index to create?

Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    data
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

On 10th of September 2021, Noah Misch committed patch:

Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.
 
This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a new database cluster from scratch may
need to create a schema, grant more privileges, etc.  Out-of-tree test
suites may require such updates.
 
Reviewed by Peter Eisentraut.
 
Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com

Continue reading Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

How to get advisory lock in shell?

Recently we had interesting problem. There are some maintenance tools, that do stuff that touch database. One can be calling repack, another can be specific dump, and yet another can be applying migrations from application.

The problem is that they can step over each other toes, and cause issues.

So we needed to add some way to prevent them from running at the same time…

Continue reading How to get advisory lock in shell?

New changes on explain.depesz.com – fixed calculations of exclusive times

When you see plan on explain.depesz.com, one of the columns visible is exclusive.

In principle it's simple – take time of current node, subtract times of all sub nodes, and you get how much time was spent in this node alone.

Reality is not that simple. Two things especially make the calculations complicated. CTEs, and InitPlans.

Now, with the just pushed change to Pg::Explain (plan parsing library), and site itself, some of the bad calculations are gone.

Let's see some examples:

Continue reading New changes on explain.depesz.com – fixed calculations of exclusive times

Display “settings” from plans on explain.depesz.com

Some time ago I wrote about new options for explains – one that prints settings that were modified from default.

This looks like this:

Aggregate  (cost=35.36..35.37 rows=1 width=8)
  ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..34.29 rows=429 width=0)
Settings: enable_seqscan = 'off'

Finally, today, I pushed a change that displays them on explain.depesz.com. To make it more usable, it will also provide links to documentation about each such options.

For example, check this plan.

Hope you'll find it useful.

How to get list of elements from multiranges?

So, some time ago, Pg devs added multi ranges – that is datatype that can be used to store multiple ranges in single column.

The thing is that it wasn't really simple how to get list of ranges from within such multirange. There was no operator, no way to split it.

A month ago Alexander Korotkov committed patch that added unnest() over multiranges, but it got some problems, and was reverted

It will eventually made it's way into sources, I assume, but in the mean time – a friend of mine asked how to get list of elements from multiranges. So I looked into it.

Continue reading How to get list of elements from multiranges?