February 7th, 2017 by depesz | Tags: , , , , | 6 comments »
Did it help? If yes - maybe you can help me?

On 20th of January, Peter Eisentraut committed patch:

Logical replication
 
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers
 
From: Petr Jelinek
Reviewed-by: Steve Singer
Reviewed-by: Andres Freund
Reviewed-by: Erik Rijkers
Reviewed-by: Peter Eisentraut

Despite short commit message, this is huge patch. It modified (or created) 119 files. And whole patch is almost 500kB!

For testing purposes, I figured I'll setup the simplest setup: master + slave. Master is on port 5100, and slave on 5101. I will keep prompts showing port, so it will be clear what is ran where. For example:

(depesz@[local]:5100) 10:57:03 [depesz]
$ select version();

This shows that I am ran the select on master database: [local]:5100.

First, obviously, I will need some table. It doesn't have to be complicated, so:

(depesz@[local]:5100) 10:59:47 [depesz]
$ create table test (
    id serial primary key,
    whatever text,
    when_ts timestamptz
);
CREATE TABLE

To make things a bit more interesting, let's create some random data before doing anything with replication itself:

(depesz@[local]:5100) 11:08:41 [depesz]
$ insert into test (whatever, when_ts) select 'whatever #' || i, now() - '1 year'::interval * random() from generate_series(1,5) i order by random();
INSERT 0 5
 
(depesz@[local]:5100) 11:09:15 [depesz]
$ select * from test;
 id |  whatever   |            when_ts
----+-------------+-------------------------------
  1 | whatever #1 | 2016-12-18 09:05:36.290453+01
  2 | whatever #3 | 2016-10-09 02:51:31.125653+02
  3 | whatever #2 | 2016-03-30 10:54:04.456853+02
  4 | whatever #5 | 2016-03-21 08:06:02.268053+01
  5 | whatever #4 | 2016-02-24 23:12:46.620053+01
(5 rows)

Now, I need to create so called publication. This is basically a list of tables (other objects are currently not supported) that will be replicated when a subscriber will fetch data for given publication.

(depesz@[local]:5100) 11:00:02 [depesz]
$ CREATE PUBLICATION test_pub
    FOR TABLE test;
CREATE PUBLICATION

Publications can be limited to specific operations (insert/update/delete), but by default it includes all operations. Important note is that you can't have different set of operations for different tables in single publication.

If you'd need it, just create multiple publications – each table can be in as many publications as needed.

I could also do:

$ CREATE PUBLICATION test_pub FOR ALL TABLES;

this would add all tables in current database to the publication. It would also automatically add all tables created in future. Since I prefer to be explicit, let's use listed tables for now.

Now, it's time to actually subscribe to replication. Before I can do it, though, I need to create all tables that the subscription contains on slave:

(depesz@[local]:5101) 11:15:18 [depesz]
$ create table test (
    id serial primary key,
    whatever text,
    when_ts timestamptz
);
CREATE TABLE

With this in place, I can create subscription:

(depesz@[local]:5101) 11:22:39 [depesz]
$ CREATE SUBSCRIPTION test_sub
    CONNECTION 'host=127.0.0.1 port=5100 dbname=depesz user=postgres'
    PUBLICATION test_pub;
NOTICE:  created replication slot "test_sub" on publisher
CREATE SUBSCRIPTION

A moment later:

(depesz@[local]:5101) 11:23:01 [depesz]
$ select * from test;
 id | whatever | when_ts
----+----------+---------
(0 rows)

Hmm, original rows have not been replicated. But what about new ones?

(depesz@[local]:5100) 11:22:32 [depesz]
$ insert into test (whatever, when_ts) values ('test after sub', now());
INSERT 0 1
 
(depesz@[local]:5101) 11:23:25 [depesz]
$ select * from test;
 id |    whatever    |           when_ts
----+----------------+------------------------------
  6 | test after sub | 2017-02-06 11:23:30.26246+01
(1 row)

OK. New rows have been replicated. But what about initial sync?

Let's try something:

(depesz@[local]:5101) 11:40:34 [depesz]
$ drop subscription test_sub;
NOTICE:  dropped replication slot "test_sub" on publisher
DROP SUBSCRIPTION
 
(depesz@[local]:5101) 11:45:06 [depesz]
$ truncate test;
TRUNCATE TABLE

Now, let's restore test table on master to initial state:

(depesz@[local]:5100) 11:40:32 [depesz]
$ delete from test where id > 5;
DELETE 1
 
(depesz@[local]:5100) 11:45:44 [depesz]
$ select setval('test_id_seq', 5);
 setval
--------
      5
(1 row)

And let's try to play with enable/disable subscription:

(depesz@[local]:5101) 11:46:26 [depesz]
$ CREATE SUBSCRIPTION test_sub
    CONNECTION 'host=127.0.0.1 port=5100 dbname=depesz user=postgres'
    PUBLICATION test_pub
    WITH (DISABLED);
NOTICE:  created replication slot "test_sub" on publisher
CREATE SUBSCRIPTION

Now, with replication setup, let's try to load snapshot of data using pg_dump:

=$ pg_dump -p 5100 -a -t test depesz | psql -p 5101 -qX -d depesz
 setval
--------
      5
(1 row)

Now, let's assume that in mean time some things have changed:

(depesz@[local]:5100) 11:50:57 [depesz]
$ insert into test (whatever, when_ts) values ('test after disabled sub', now());
INSERT 0 1
 
(depesz@[local]:5100) 11:51:05 [depesz]
$ delete from test where id = 3;
DELETE 1
 
(depesz@[local]:5100) 11:51:09 [depesz]
$ update test set whatever = 'updated' where id = 2;
UPDATE 1

And now, let's try to enable subscription:

(depesz@[local]:5101) 11:49:28 [depesz]
$ alter subscription test_sub enable;
ALTER SUBSCRIPTION

After a while:

(depesz@[local]:5101) 11:52:06 [depesz
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  1 | whatever #1             | 2016-12-18 09:05:36.290453+01
  4 | whatever #5             | 2016-03-21 08:06:02.268053+01
  5 | whatever #4             | 2016-02-24 23:12:46.620053+01
  6 | test after disabled sub | 2017-02-06 11:51:05.110897+01
  2 | updated                 | 2016-10-09 02:51:31.125653+02
(5 rows)

Looks like it replicated changes since dump. Will it replicate new changes?

(depesz@[local]:5100) 11:53:40 [depesz]
$ insert into test (whatever, when_ts) values ('test after enabling sub', now());
INSERT 0 1

and on slave:

(depesz@[local]:5101) 11:53:42 [depesz]
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  1 | whatever #1             | 2016-12-18 09:05:36.290453+01
  4 | whatever #5             | 2016-03-21 08:06:02.268053+01
  5 | whatever #4             | 2016-02-24 23:12:46.620053+01
  6 | test after disabled sub | 2017-02-06 11:51:05.110897+01
  2 | updated                 | 2016-10-09 02:51:31.125653+02
  7 | test after enabling sub | 2017-02-06 11:53:40.079853+01
(6 rows)

That looks OK. But taking dump will take a moment. What will happen if new rows, or other changes, will happen after creation of subcription, but before getting data dump?

Let's cleanup, and retry:

(depesz@[local]:5101) 11:53:42 [depesz]
$ drop subscription test_sub;
NOTICE:  dropped replication slot "test_sub" on publisher
DROP SUBSCRIPTION
 
(depesz@[local]:5101) 11:55:25 [depesz]
$ truncate test;
TRUNCATE TABLE
(depesz@[local]:5100) 11:53:40 [depesz]
$ drop publication test_pub;
DROP PUBLICATION
 
(depesz@[local]:5100) 11:55:51 [depesz]
$ truncate test;
TRUNCATE TABLE
 
(depesz@[local]:5100) 11:56:20 [depesz]
$ alter sequence test_id_seq restart;
ALTER SEQUENCE

Now, re-insert base data:

(depesz@[local]:5100) 11:56:23 [depesz]
$ insert into test (whatever, when_ts) select 'whatever #' || i, now() - '1 year'::interval * random() from generate_series(1,5) i order by random();
INSERT 0 5
 
(depesz@[local]:5100) 11:57:19 [depesz]
$ select * from test;
 id |  whatever   |            when_ts
----+-------------+-------------------------------
  1 | whatever #3 | 2017-01-23 14:33:14.398735+01
  2 | whatever #2 | 2016-12-26 09:43:36.853135+01
  3 | whatever #1 | 2016-11-30 13:54:18.401935+01
  4 | whatever #4 | 2016-08-29 16:04:23.950735+02
  5 | whatever #5 | 2016-07-01 12:08:05.710735+02
(5 rows)

With this in place, I can create publication and subscription (for now disabled):

(depesz@[local]:5100) 11:58:08 [depesz]
$ CREATE PUBLICATION test_pub
    FOR TABLE test;
CREATE PUBLICATION
 
(depesz@[local]:5101) 11:55:28 [depesz]
$ CREATE SUBSCRIPTION test_sub
    CONNECTION 'host=127.0.0.1 port=5100 dbname=depesz user=postgres'
    PUBLICATION test_pub
    WITH (DISABLED);
NOTICE:  created replication slot "test_sub" on publisher
CREATE SUBSCRIPTION

Now, let's do some changes before making dump:

(depesz@[local]:5100) 11:58:14 [depesz]
$ delete from test where id = 1;
DELETE 1
 
(depesz@[local]:5100) 11:59:09 [depesz]
$ insert into test (whatever, when_ts) values ('test after disabled sub', now());
INSERT 0 1
 
(depesz@[local]:5100) 11:59:23 [depesz]
$ update test set whatever = 'updated' where id = 2;
UPDATE 1
 
(depesz@[local]:5100) 11:59:36 [depesz]
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  3 | whatever #1             | 2016-11-30 13:54:18.401935+01
  4 | whatever #4             | 2016-08-29 16:04:23.950735+02
  5 | whatever #5             | 2016-07-01 12:08:05.710735+02
  6 | test after disabled sub | 2017-02-06 11:59:23.935594+01
  2 | updated                 | 2016-12-26 09:43:36.853135+01
(5 rows)

Now, let's copy data:

=$ pg_dump -p 5100 -a -t test depesz | psql -p 5101 -qX -d depesz
 setval
--------
      6
(1 row)

Now, let's change data on master some more:

(depesz@[local]:5100) 11:59:38 [depesz]
$ delete from test where id = 3;
DELETE 1
 
(depesz@[local]:5100) 12:00:35 [depesz]
$ update test set whatever = 'updated 2' where id = 4;
UPDATE 1
 
(depesz@[local]:5100) 12:00:43 [depesz]
$ insert into test (whatever, when_ts) values ('test after pg_dump', now());
INSERT 0 1
 
(depesz@[local]:5100) 12:00:51 [depesz]
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  5 | whatever #5             | 2016-07-01 12:08:05.710735+02
  6 | test after disabled sub | 2017-02-06 11:59:23.935594+01
  2 | updated                 | 2016-12-26 09:43:36.853135+01
  4 | updated 2               | 2016-08-29 16:04:23.950735+02
  7 | test after pg_dump      | 2017-02-06 12:00:51.386485+01
(5 rows)

OK, and now, let's retry enabling subscription:

(depesz@[local]:5101) 11:58:38 [depesz]
$ alter subscription test_sub enable;
ALTER SUBSCRIPTION

moment later:

(depesz@[local]:5101) 12:02:02 [depesz]
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  3 | whatever #1             | 2016-11-30 13:54:18.401935+01
  4 | whatever #4             | 2016-08-29 16:04:23.950735+02
  5 | whatever #5             | 2016-07-01 12:08:05.710735+02
  6 | test after disabled sub | 2017-02-06 11:59:23.935594+01
  2 | updated                 | 2016-12-26 09:43:36.853135+01
(5 rows)

And what about new rows?

(depesz@[local]:5100) 12:00:53 [depesz]
$ insert into test (whatever, when_ts) values ('test after enabling subs', now()) returning *;
 id |         whatever         |           when_ts
----+--------------------------+------------------------------
  8 | test after enabling subs | 2017-02-06 12:03:22.76953+01
(1 row)
(depesz@[local]:5101) 12:03:45 [depesz]
$ select * from test;
 id |        whatever         |            when_ts
----+-------------------------+-------------------------------
  3 | whatever #1             | 2016-11-30 13:54:18.401935+01
  4 | whatever #4             | 2016-08-29 16:04:23.950735+02
  5 | whatever #5             | 2016-07-01 12:08:05.710735+02
  6 | test after disabled sub | 2017-02-06 11:59:23.935594+01
  2 | updated                 | 2016-12-26 09:43:36.853135+01
(5 rows)

Looks like replication is stuck.

Logs on slave show:

2017-02-06 12:04:46.969 CET @ 13680  LOG:  logical replication apply for subscription "test_sub" has started
2017-02-06 12:04:46.971 CET @ 13680  ERROR:  duplicate key value violates unique constraint "test_pkey"
2017-02-06 12:04:46.971 CET @ 13680  DETAIL:  Key (id)=(6) already exists.

So, the problem is that row with id = 6 was transferred in dump, but it was also in replication set (it was added after subscription was created). So let's remove it on slave:

(depesz@[local]:5101) 12:06:21 [depesz]
$ delete from test  where id = 6;
DELETE 1

After couple of seconds:

(depesz@[local]:5101) 12:06:26 [depesz]
$ select * from test;
 id |         whatever         |            when_ts
----+--------------------------+-------------------------------
  5 | whatever #5              | 2016-07-01 12:08:05.710735+02
  6 | test after disabled sub  | 2017-02-06 11:59:23.935594+01
  2 | updated                  | 2016-12-26 09:43:36.853135+01
  4 | updated 2                | 2016-08-29 16:04:23.950735+02
  7 | test after pg_dump       | 2017-02-06 12:00:51.386485+01
  8 | test after enabling subs | 2017-02-06 12:03:22.76953+01
(6 rows)

All data replicated, and up to date.

We can see progress of replication on master:

(depesz@[local]:5100) 12:09:22 [depesz]
$ select pg_current_xlog_location(), * from pg_stat_replication ;
-[ RECORD 1 ]------------+------------------------------
pg_current_xlog_location | 0/41DD9600
pid                      | 14329
usesysid                 | 16389
usename                  | postgres
application_name         | test_sub
client_addr              | 127.0.0.1
client_hostname          | [null]
client_port              | 55048
backend_start            | 2017-02-06 12:06:26.282701+01
backend_xmin             | [null]
state                    | streaming
sent_location            | 0/41DD9600
write_location           | 0/41DD9600
flush_location           | 0/41DD9600
replay_location          | 0/41DD9600
sync_priority            | 0
sync_state               | async

and on slave we can see if it was all applied:

(depesz@[local]:5101) 12:10:17 [depesz]
$ select * from pg_stat_subscription ;
-[ RECORD 1 ]---------+------------------------------
subid                 | 16437
subname               | test_sub
pid                   | 14327
received_lsn          | 0/41DD94E8
last_msg_send_time    | 2017-02-06 12:06:31.453325+01
last_msg_receipt_time | 2017-02-06 12:06:31.453388+01
latest_end_lsn        | 0/41DD94E8
latest_end_time       | 2017-02-06 12:06:31.453325+01

My take away from this is: it's not perfect. We could definitely use a proper way to sync initial data in tables. But even as it is now, we can live with it, I think.

Of course, if you plan to keep slave using logical replication, and then failover to it, you will need to update sequences manually, and keep schema up to date on master. But I don't actually think it's all that important. For production slaves, normal WAL replication is better, and it handles all these cases. Logical replication seems to be for different purposes (data warehousing, generating reports, upgrading across pg major versions). And for this, it seems to work really, really well.

All things said – it's great that we have this now in core PostgreSQL. Thanks a lot to everybody involved.

  1. 6 comments

  2. # Misha
    Feb 7, 2017

    it is definitely NOT logical replication!
    it is WAL based,
    and whole wal traffic from all cluster affected!

  3. # Andreas
    Feb 7, 2017

    @Misha The new logical replication only sends the logical changes (added, removed and updated rows) of the tables you subscribe to, so it is indeed logical replication. It is only WAL-based in that it reads the WAL to reconstruct a stream of relevant logical changes, it does not send the WAL to the the slave.

  4. # hans
    Feb 8, 2017

    Great, that’s a big step forward. Postgres 10 is going to be a really big release.

    I wonder why the initial sync isn’t done automatically, e.g. through a foreign server – the information needed for that is already part of the subscription definition as far as I can tell.

  5. # PJ
    Feb 8, 2017

    FYI, there is patch for initial data copy in the commitfest and it’s expected to be happening for PG10.

  6. # Artur
    Feb 8, 2017

    >> It would also automatically add all tables created in future

    I would suggest to split it into two features:

    CREATE PUBLICATION test_pub FOR ALL TABLES AND AUTOADD
    …and of course
    ALTER PUBLICATION test_pub SET AUTOADD OFF

  7. # Misha
    Feb 9, 2017

    @Andreas

    > It is only WAL-based in that it reads the WAL to reconstruct a stream

    Does it mean that now we’ve got a separate buffer instead of wal to accumulate events?

    And what would be happen if consumer of only one small table was killed? could we get wal-bomb (like zip-bomb) on whole cluster in that case?

Leave a comment