Waiting for PostgreSQL 15 – Allow publishing the tables of schema.

On 27th of October 2021, Amit Kapila committed patch:

Allow publishing the tables of schema.
 
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.
 
The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
 
A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.
 
Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.
 
Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com

We got logical replication since Pg 10. And since then, we could make it work for specific table(s), or all tables in database.

Now, we have one more step in granularity – all tables in a schema(s).

Let's see how that works. To test it I made two databases (on the same server), named db_primary and db_replica:

=$ psql -X -p 5430 -c 'create database db_primary'
CREATE DATABASE
 
=$ psql -X -p 5431 -c 'create database db_replica'
CREATE DATABASE

Then, in each of the dbs I ran this script:

CREATE  TABLE  public.alpha  ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
 
CREATE SCHEMA bob;
CREATE SCHEMA carol;
CREATE SCHEMA dan;
 
CREATE  TABLE  bob.echo      ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  bob.foxtrot   ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  carol.golf    ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  carol.hotel   ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  carol.india   ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  dan.juliett   ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  dan.kilo      ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  dan.lima      ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);
CREATE  TABLE  dan.mike      ( id INT generated always AS IDENTITY PRIMARY KEY, ts timestamptz, payload TEXT);

It makes 3 schemas, and set of 10 tables with identical structure, but different names, and in 4 schemas (3 new created, and one table in public).

Now, let's make publication in db_primary that will contain all tables in ‘bob' and some in ‘dan'. And the lonely table in public:

db_primary=# CREATE publication test_pub FOR TABLE dan.juliett, dan.kilo, public.alpha, ALL TABLES IN schema bob;
CREATE PUBLICATION
 
db_primary=# \dRp+
                           Publication test_pub
 Owner  │ ALL TABLES │ Inserts │ Updates │ Deletes │ Truncates │ Via root 
────────┼────────────┼─────────┼─────────┼─────────┼───────────┼──────────
 depesz │ f          │ t       │ t       │ t       │ t         │ f
TABLES:
    "dan.juliett"
    "dan.kilo"
    "public.alpha"
TABLES FROM schemas:
    "bob"

OK. All set. So let's subscribe from replica:

db_replica=# CREATE subscription test_sub connection 'host=127.0.0.1 port=5430 dbname=db_primary user=postgres' publication test_pub;
NOTICE:  created replication slot "test_sub" ON publisher
CREATE SUBSCRIPTION

So, let's test it:

db_primary=# INSERT INTO public.alpha (ts, payload) VALUES (now(), 'initial value') returning *;
 id |              ts               |    payload    
----+-------------------------------+---------------
  1 | 2021-11-16 08:13:04.629613+01 | initial VALUE
(1 ROW)
 
INSERT 0 1
 
db_primary=# INSERT INTO public.alpha (ts, payload) VALUES (now(), 'initial value') returning *;
 id |              ts               |    payload    
----+-------------------------------+---------------
  2 | 2021-11-16 08:13:15.073362+01 | initial VALUE
(1 ROW)
 
INSERT 0 1
 
db_primary=# UPDATE public.alpha SET payload = 'second value' WHERE id = 2;
UPDATE 1
 
db_primary=# SELECT * FROM public.alpha;
 id |              ts               |    payload    
----+-------------------------------+---------------
  1 | 2021-11-16 08:13:04.629613+01 | initial VALUE
  2 | 2021-11-16 08:13:15.073362+01 | SECOND VALUE
(2 ROWS)

and how it looks on the subscriber?

db_replica=# SELECT * FROM public.alpha;
 id |              ts               |    payload    
----+-------------------------------+---------------
  1 | 2021-11-16 08:13:04.629613+01 | initial VALUE
  2 | 2021-11-16 08:13:15.073362+01 | SECOND VALUE
(2 ROWS)

That was simple. What about table that is in some schema? test_pub contains all tables from bob, so let's see:

db_primary=# INSERT INTO bob.echo (ts, payload) VALUES (now(), 'xxx') returning *;
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 08:15:50.809102+01 | xxx
(1 ROW)
 
INSERT 0 1
 
db_primary=# INSERT INTO bob.foxtrot (ts, payload) VALUES (now(), 'yyy') returning *;
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 08:15:50.824428+01 | yyy
(1 ROW)
 
INSERT 0 1

And replica:

db_replica=# SELECT * FROM bob.echo;
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 08:15:50.809102+01 | xxx
(1 ROW)
 
db_replica=# SELECT * FROM bob.foxtrot;
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 08:15:50.824428+01 | yyy
(1 ROW)

Sweet.

So, let's see what will happen if I'll add new table.

=$ psql -X -p 5430 -d db_primary -c "CREATE TABLE bob.november ( id int generated always as identity PRIMARY KEY, ts timestamptz, payload TEXT);"
CREATE TABLE
 
=$ psql -X -p 5431 -d db_replica -c "CREATE TABLE bob.november ( id int generated always as identity PRIMARY KEY, ts timestamptz, payload TEXT);"
CREATE TABLE

With this in place, let's see if new data in november will be replicated:

=$ psql -X -p 5430 -d db_primary -c "INSERT INTO bob.november (ts, payload) values (now(), 'cat') returning *"
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 10:27:43.817067+01 | cat
(1 ROW)
 
INSERT 0 1

And verification:

=$ psql -X -p 5431 -d db_replica -c "select * from bob.november"
 id | ts | payload 
----+----+---------
(0 ROWS)

Why? Well, the reason is simple. While publication does contain all the data, subscription doesn't know about new tables.

To make it work, I have to do one more thing on replica:

db_replica=# ALTER subscription test_sub refresh publication;
ALTER SUBSCRIPTION

and immediately older rows will appear:

db_replica=# SELECT * FROM bob.november;
 id |              ts               | payload 
----+-------------------------------+---------
  1 | 2021-11-16 10:27:43.817067+01 | cat
(1 ROW)

Sweet. Definitely useful. Thanks to all involved 🙂

3 thoughts on “Waiting for PostgreSQL 15 – Allow publishing the tables of schema.”

  1. Thanks for the heads up on this. One note, you create the db_replica database twice in the CREATE DATABASE section.

  2. Any plans to refresh table DDL change propagation to subscriber?

Comments are closed.