Waiting for PostgreSQL 10 – Logical replication support for initial data copy

On 23rd of March 2017, Peter Eisentraut committed patch:

Logical replication support for initial data copy
 
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
 
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function.  The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
 
A WAL receiver can now execute full SQL commands.  This is used here to
obtain information about tables and publications.
 
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
 
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
 
Author: Petr Jelinek

Some time ago I wrote about support for logical replication in PostgreSQL 10, and noted that it lacks support for sending initial data state.

This was solvable using some semi-clever tricks, but was a pain. Now, supposedly, the problem is gone. Let's see if it really is the case.

Just like previously, I ran master on port 5100, and slave on 5101, and made my psql prompt obvious about where I run the queries:

master $

vs.

slave $

So, let's create the example table, and some data:

master $ CREATE TABLE test (
      id serial PRIMARY KEY,
      whatever text,
      when_ts timestamptz
  );
CREATE TABLE
 
master $ INSERT INTO test (whatever, when_ts)
    SELECT 'whatever #' || i, now() - '1 year'::INTERVAL * random()
    FROM generate_series(1,5) i ORDER BY random();
INSERT 0 5

Then, I need to create the table on slave:

slave $ CREATE TABLE test (
      id serial PRIMARY KEY,
      whatever text,
      when_ts timestamptz
  );
CREATE TABLE

With this in place, let's create publication on master:

master $ CREATE PUBLICATION test_pub FOR TABLE test;
CREATE PUBLICATION

and now on slave, let's subscribe to it:

slave $ CREATE SUBSCRIPTION test_sub
      CONNECTION 'host=127.0.0.1 port=5100 dbname=depesz user=postgres'
      PUBLICATION test_pub;
NOTICE:  created replication slot "test_sub" ON publisher
NOTICE:  synchronized TABLE states
CREATE SUBSCRIPTION
 
slave $ SELECT * FROM test;
 id |  whatever   |            when_ts
----+-------------+-------------------------------
  1 | whatever #2 | 2017-03-24 10:12:52.351159+01
  2 | whatever #3 | 2017-01-08 04:49:43.931959+01
  3 | whatever #4 | 2016-10-14 23:43:48.079159+02
  4 | whatever #5 | 2016-08-04 21:37:21.467959+02
  5 | whatever #1 | 2016-07-23 05:21:23.090359+02
(5 ROWS)

Nice. Data is copied automatically, by default. What about changes?

master $ DELETE FROM test WHERE id = 2;
DELETE 1
 
master $ UPDATE test SET whatever = 'updated' WHERE id = 3;
UPDATE 1
 
master $ INSERT INTO test (whatever, when_ts) VALUES ('new', now());
INSERT 0 1

and a moment later on slave:

slave $ SELECT * FROM test;
 id |  whatever   |            when_ts
----+-------------+-------------------------------
  1 | whatever #2 | 2017-03-24 10:12:52.351159+01
  4 | whatever #5 | 2016-08-04 21:37:21.467959+02
  5 | whatever #1 | 2016-07-23 05:21:23.090359+02
  3 | updated     | 2016-10-14 23:43:48.079159+02
  6 | NEW         | 2017-04-03 12:16:07.775591+02
(5 ROWS)

Nice. There could be a case where you don't want to copy initial data. In such cases CREATE SUBSCRIPTION has option NOCOPY DATA that will bring back previous behavior.

Great. This is really, really amazing. Thanks a lot Petr (and all other involved).

3 thoughts on “Waiting for PostgreSQL 10 – Logical replication support for initial data copy”

  1. Does this also mean, that upgrading from 10.0 to 10.x or 11.0 Will not require external components like slony or something else to do zero downtime upgrade?

Comments are closed.