Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

On 3rd of February 2022, Peter Eisentraut committed patch:

Add UNIQUE null treatment option 
 
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
 
This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
 
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
 
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
 
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com

Continue reading Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

Waiting for PostgreSQL 15 – Add HEADER support to COPY text format

On 28th of January 2022, Peter Eisentraut committed patch:

Add HEADER support to COPY text format 
 
The COPY CSV format supports the HEADER option to output a header
line.  This patch adds the same option to the default text format.  On
input, the HEADER option causes the first line to be skipped, same as
with CSV.
 
Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr>
Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com

Continue reading Waiting for PostgreSQL 15 – Add HEADER support to COPY text format

Waiting for PostgreSQL 15 – Introduce log_destination=jsonlog

On 17th of January 2022, Michael Paquier committed patch:

Introduce log_destination=jsonlog
 
"jsonlog" is a new value that can be added to log_destination to provide
logs in the JSON format, with its output written to a file, making it
the third type of destination of this kind, after "stderr" and
"csvlog".  The format is convenient to feed logs to other applications.
There is also a plugin external to core that provided this feature using
the hook in elog.c, but this had to overwrite the output of "stderr" to
work, so being able to do both at the same time was not possible.  The
files generated by this log format are suffixed with ".json", and use
the same rotation policies as the other two formats depending on the
backend configuration.
 
This takes advantage of the refactoring work done previously in ac7c807,
bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the
TAP tests, making the addition of any new file-based format rather
straight-forward.
 
The documentation is updated to list all the keys and the values that
can exist in this new format.  pg_current_logfile() also required a
refresh for the new option.
 
Author: Sehrope Sarkuni, Michael Paquier
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com

Continue reading Waiting for PostgreSQL 15 – Introduce log_destination=jsonlog

Configuration changes across Pg versions

Back in 2018 I wrote why-upgrade.depesz.com – aggregator of changelogs between various versions of Pg.

Want to know what you will get when upgrading from 12.1 to 12.9? Here you go. Longer time changes? Like, from 9.5.20 to 14.1? I've got your back.

It even has a way to list every change that relates to anything related to indexes.

Today I updated the code, again to show which GUCs (configuration parameters) have changed between given versions. For example, getting diff from 13.5 to 14.1 shows you that:

  • two parameters were removed, and are no longer there
  • there are 17 new parameters
  • three parameters had their default values changed

And each GUC that is listed is (well, is supposed to but in some cases it can't) linked to relevant part of docs that describe what it is.

To make it work I compiled every version of Pg, since 7.2 (there have been 410 of them!), and extracted list of config params, and their default values.

Then, I fetched docs for all major versions of Pg, and extracted list of documentation fragments that relate to each config parameter.

This will require more work on each subsequent release, but I think I can manage it.

Any way – hope you'll find it helpful.

Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

On 3rd of August 2021, Tom Lane committed patch:

Add assorted new regexp_xxx SQL functions.
 
This patch adds new functions regexp_count(), regexp_instr(),
regexp_like(), and regexp_substr(), and extends regexp_replace()
with some new optional arguments.  All these functions follow
the definitions used in Oracle, although there are small differences
in the regexp language due to using our own regexp engine -- most
notably, that the default newline-matching behavior is different.
Similar functions appear in DB2 and elsewhere, too.  Aside from
easing portability, these functions are easier to use for certain
tasks than our existing regexp_match[es] functions.
 
Gilles Darold, heavily revised by me
 
Discussion: https://postgr.es/m/fc160ee0-c843-b024-29bb-97b5da61971f@darold.net

Continue reading Waiting for PostgreSQL 15 – Add assorted new regexp_xxx SQL functions.

Does varchar(n) use less disk space than varchar() or text?

Some time ago on Slack some person said:

varchar is better (storage efficiency), i recommend using it for less than 2048 chars, for the best : TEXT

There was discussion that followed, the person that claimed this efficiency never backed their claim, saying only that:

VARChar takes much less ‘place' than TEXT … but have to face it to believe it

and when I asked for

Show me db fiddle with queries that show this difference and I'll believe

I got:

HA ha ! i let you dig it  i've some work ….

I reacted (over reacted), and the thread died.

I know (knew?) that varchar/text are the same internally (wrote about it), but perhaps there have been recent change?

Continue reading Does varchar(n) use less disk space than varchar() or text?

Waiting for PostgreSQL 15 – Allow publishing the tables of schema.

On 27th of October 2021, Amit Kapila committed patch:

Allow publishing the tables of schema.
 
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.
 
The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
 
A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.
 
Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.
 
Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com

Continue reading Waiting for PostgreSQL 15 – Allow publishing the tables of schema.

Why is it hard to automatically suggest what index to create?

Every now and then someone asks me to add index suggestions to explain.depesz.com.

I always respond with polite decline. This is complicated thing to do, and I just don't have that time.

Lately I was asked, on Slack to add (to explain.depesz.com) link to pganalyze Index Advisor for Postgres.

So I checked it out. And results prompted me to write this blogpost.

Continue reading Why is it hard to automatically suggest what index to create?

Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    DATA
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.

On 10th of September 2021, Noah Misch committed patch:

Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.
 
This switches the default ACL to what the documentation has recommended
since CVE-2018-1058.  Upgrades will carry forward any old ownership and
ACL.  Sites that declined the 2018 recommendation should take a fresh
look.  Recipes for commissioning a new database cluster from scratch may
need to create a schema, grant more privileges, etc.  Out-of-tree test
suites may require such updates.
 
Reviewed by Peter Eisentraut.
 
Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com

Continue reading Waiting for PostgreSQL 15 – Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.