performance – select * from depesz;

pg_dump speed across versions

Got interested recently in speed of pg_dump. Specifically, if, over the years, it has became faster, and if yes, how much.

Couple of years ago I was in position where we needed to run pg_dump, and found some inefficiencies, which got later patched. This was around the version 12. So, how does the situation look now? Interestingly, not so great…

Continue reading pg_dump speed across versions

Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

On 3rd of March 2025, Tomas Vondra committed patch:

Allow parallel CREATE INDEX for GIN indexes
 
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.
 
The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.
 
The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.
 
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).
 
Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.
 
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

Continue reading Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

How much speed you’re leaving at the table if you use default locale?

I've been to PGConf.dev recently, and one of the talks was about collations.

The whole talk was interesting (to put it mildly), but the thing that stuck with me is that we really shouldn't be using default collation provider (libc with locale based collation), unless it's really needed, because we're wasting performance. But how much of a hit is it?

Continue reading How much speed you're leaving at the table if you use default locale?

Waiting for PostgreSQL 14 – Allow configurable LZ4 TOAST compression.

On 19th of March 2021, Robert Haas committed patch:

Allow configurable LZ4 TOAST compression.
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com

Continue reading Waiting for PostgreSQL 14 – Allow configurable LZ4 TOAST compression.

Waiting for PostgreSQL 14 – Improvements for handling large number of connections

Title:

There is a thread on pgsql-hackers mailing list, dating back to 1st of March 2020 about changes to Pg to optimize handling of larger number of connections.

The thread is pretty long, and the discussion took 5 months, but lately we got couple of commits:

that (supposedly) should improve handling of large numbers of concurrent connections.

Unfortunately, I don't have ready test servers that would allow me to sensibly tests thousands of connections, but I think we can all safely assume that PostgreSQL 14 should handle large connection counts better than any PostgreSQL before. Thanks a lot, to all involved, but specifically to mastermind of this project: Andres Freund.

Waiting for PostgreSQL 13 – Allow pg_stat_statements to track planning statistics.

On 2nd of April 2020, Fujii Masao committed patch:

Allow pg_stat_statements to track planning statistics.
This commit makes pg_stat_statements support new GUC
pg_stat_statements.track_planning. If this option is enabled,
pg_stat_statements tracks the planning statistics of the statements,
e.g., the number of times the statement was planned, the total time
spent planning the statement, etc. This feature is useful to check
the statements that it takes a long time to plan. Previously since
pg_stat_statements tracked only the execution statistics, we could
not use that for the purpose.
The planning and execution statistics are stored at the end of
each phase separately. So there are not always one-to-one relationship
between them. For example, if the statement is successfully planned
but fails in the execution phase, only its planning statistics are stored.
This may cause the users to be able to see different pg_stat_statements
results from the previous version. To avoid this,
pg_stat_statements.track_planning needs to be disabled.
This commit bumps the version of pg_stat_statements to 1.8
since it changes the definition of pg_stat_statements function.
Author: Julien Rouhaud, Pascal Legrand, Thomas Munro, Fujii Masao
Reviewed-by: Sergei Kornilov, Tomas Vondra, Yoshikazu Imai, Haribabu Kommi, Tom Lane
Discussion: https://postgr.es/m/CAHGQGwFx_=DO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw@mail.gmail.com
Discussion: https://postgr.es/m/CAEepm=0e59Y_6Q_YXYCTHZkqOc6H2pJ54C_Xe=VFu50Aqqp_sA@mail.gmail.com
Discussion: https://postgr.es/m/DB6PR0301MB21352F6210E3B11934B0DCC790B00@DB6PR0301MB2135.eurprd03.prod.outlook.com

Continue reading Waiting for PostgreSQL 13 – Allow pg_stat_statements to track planning statistics.

Why I’m not fan of uuid datatype

Recently, on irc, there were couple of cases where someone wanted to use uuid as datatype for their primary key.

I opposed, and tried to explain, but IRC doesn't really allow for longer texts, so figured I'll write a blogpost.

Continue reading Why I'm not fan of uuid datatype

Waiting for PostgreSQL 13 – Add support for –jobs in reindexdb

On 27th of July 2019, Michael Paquier committed patch:

Add support for --jobs in reindexdb
 
When doing a schema-level or a database-level operation, a list of
relations to build is created which gets processed in parallel using
multiple connections, based on the recent refactoring for parallel slots
in src/bin/scripts/.  System catalogs are processed first in a
serialized fashion to prevent deadlocks, followed by the rest done in
parallel.
 
This new option is not compatible with --system as reindexing system
catalogs in parallel can lead to deadlocks, and with --index as there is
no conflict handling for indexes rebuilt in parallel depending in the
same relation.
 
Author: Julien Rouhaud
 
Discussion: https://postgr.es/m/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg@mail.gmail.com

Continue reading Waiting for PostgreSQL 13 – Add support for –jobs in reindexdb

Waiting for PostgreSQL 12 – Allow VACUUM to be run with index cleanup disabled.

On 4th of April 2019, Robert Haas committed patch:

Allow VACUUM to be run with index cleanup disabled.
 
 
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.
 
Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.
 
Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com

Continue reading Waiting for PostgreSQL 12 – Allow VACUUM to be run with index cleanup disabled.

Waiting for PostgreSQL 12 – Generated columns

On 30th of March 2019, Peter Eisentraut committed patch:

Generated columns
 
This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view or
materialized view but on a column basis.
 
This implements one kind of generated column: stored (computed on
write).  Another kind, virtual (computed on read), is planned for the
future, and some room is left for it.
 
Discussion: https://www.postgresql.org/message-id/flat/-4019-bdb1-699e-@2ndquadrant.com

Continue reading Waiting for PostgreSQL 12 – Generated columns

=$
|

Tag: performance