Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?

Recently(ish) I had a conversation on one of PostgreSQL support chats (IRC, Slack, or Discord) about efficient storage of JSON data, which compression to use, which datatype.

Unrelated to this, some people (at least two over the last year or so) said that they aren't sure if PostgreSQL doesn't optimize storage between columns, for example, storing attribute names once per column, and not once per value.

Decided to investigate…

Continue reading Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?

Waiting for PostgreSQL 19 – Sequence synchronization in logical replication.

First, on 9th of October 2025, Amit Kapila committed patch:

Add "ALL SEQUENCES" support to publications.
 
This patch adds support for the ALL SEQUENCES clause in publications,
enabling synchronization/replication of all sequences that is useful for
upgrades.
 
Publications can now include all sequences via FOR ALL SEQUENCES.
psql enhancements:
\d shows publications for a given sequence.
\dRp indicates if a publication includes all sequences.
 
ALL SEQUENCES can be combined with ALL TABLES, but not with other options
like TABLE or TABLES IN SCHEMA. We can extend support for more granular
clauses in future.
 
The view pg_publication_sequences provides information about the mapping
between publications and sequences.
 
This patch enables publishing of sequences; subscriber-side support will
be added in upcoming patches.
 
Author: vignesh C <vignesh21@gmail.com>
Author: Tomas Vondra <tomas@vondra.me>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com

And then, on 5th of November 2025, he also committed patch:

Add sequence synchronization for logical replication.
 
This patch introduces sequence synchronization. Sequences that are synced
will have 2 states:
   - INIT (needs [re]synchronizing)
   - READY (is already synchronized)
 
A new sequencesync worker is launched as needed to synchronize sequences.
A single sequencesync worker is responsible for synchronizing all
sequences. It begins by retrieving the list of sequences that are flagged
for synchronization, i.e., those in the INIT state. These sequences are
then processed in batches, allowing multiple entries to be synchronized
within a single transaction. The worker fetches the current sequence
values and page LSNs from the remote publisher, updates the corresponding
sequences on the local subscriber, and finally marks each sequence as
READY upon successful synchronization.
 
Sequence synchronization occurs in 3 places:
1) CREATE SUBSCRIPTION
    - The command syntax remains unchanged.
    - The subscriber retrieves sequences associated with publications.
    - Published sequences are added to pg_subscription_rel with INIT
      state.
    - Initiate the sequencesync worker to synchronize all sequences.
 
2) ALTER SUBSCRIPTION ... REFRESH PUBLICATION
    - The command syntax remains unchanged.
    - Dropped published sequences are removed from pg_subscription_rel.
    - Newly published sequences are added to pg_subscription_rel with INIT
      state.
    - Initiate the sequencesync worker to synchronize only newly added
      sequences.
 
3) ALTER SUBSCRIPTION ... REFRESH SEQUENCES
    - A new command introduced for PG19 by f0b3573c3a.
    - All sequences in pg_subscription_rel are reset to INIT state.
    - Initiate the sequencesync worker to synchronize all sequences.
    - Unlike "ALTER SUBSCRIPTION ... REFRESH PUBLICATION" command,
      addition and removal of missing sequences will not be done in this
      case.
 
Author: Vignesh C <vignesh21@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Peter Smith <smithpb2250@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/CAA4eK1LC+KJiAkSrpE_NwvNdidw9F2os7GERUeSxSKv71gXysQ@mail.gmail.com

Continue reading Waiting for PostgreSQL 19 – Sequence synchronization in logical replication.

Do you really need tsvector column?

When using tsearch one usually, often, creates a tsvector column to put data in, and then create index on it.

But, do you really need the index? I wrote once already that you don't have to, but then a person talked with me on IRC, and pointed this section of docs:

One advantage of the separate-column approach over an expression index … Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches.

So, let's see how big of a problem it really is.

Continue reading Do you really need tsvector column?

How much speed you’re leaving at the table if you use default locale?

I've been to PGConf.dev recently, and one of the talks was about collations.

The whole talk was interesting (to put it mildly), but the thing that stuck with me is that we really shouldn't be using default collation provider (libc with locale based collation), unless it's really needed, because we're wasting performance. But how much of a hit is it?

Continue reading How much speed you're leaving at the table if you use default locale?

Getting value from dynamic column in pl/PgSQL triggers?

Every so often, on irc, someone asks how to get value from column that is passed as argument.

This is generally seen as not possible, as pl/PgSQL doesn't have support for dynamic column names.

We can work around it, though. Are the workarounds usable, in terms of performance?

Continue reading Getting value from dynamic column in pl/PgSQL triggers?

Waiting for PostgreSQL 13 – pgbench: add –partitions and –partition-method options.

On 3rd of October 2019, Amit Kapila committed patch:

pgbench: add --partitions and --partition-method options.
 
These new options allow users to partition the pgbench_accounts table by
specifying the number of partitions and partitioning method.  The values
allowed for partitioning method are range and hash.
 
This feature allows users to measure the overhead of partitioning if any.
 
Author: Fabien COELHO
 
Alvaro Herrera
Discussion: https://postgr.es/m/alpine.DEB.2.21..7008@lancre

Continue reading Waiting for PostgreSQL 13 – pgbench: add –partitions and –partition-method options.

Waiting for PostgreSQL 10 – Implement table partitioning.

I had two month delay related to some work, but now I can finally write about:

On 7th of December, Robert Haas committed patch:

Implement table partitioning.
 
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
 
Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.
 
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
 
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.

Continue reading Waiting for PostgreSQL 10 – Implement table partitioning.

Is C faster for (instagram-style) ID generation?

Some time ago, guys from Instagram shared their approach to generating unique ids on multiple hosts in a way that guarantees (to reasonable extend) uniqueness, and doesn't require any centralized service.

Earlier this month, the build benchmarked their solution vs. UUIDs, and vs. bigserial.

I thought – whether C based code for nextid would be faster.

Continue reading Is C faster for (instagram-style) ID generation?

What logging has least overhead?

When working with PostgreSQL you generally want to get information about slow queries. The usual approach is to set log_min_duration_statement to some low(ish) value, run your app, and then analyze logs.

But you can log to many places – flat file, flat file on another disk, local syslog, remote syslog. And – perhaps, instead of log_min_duration_statement – just use pg_stat_statements?

Well, I wondered about it, and decided to test.

Continue reading What logging has least overhead?