July 26th, 2011 by depesz | Tags: , , , , , , , | 1 comment »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 19th of July, Simon Riggs committed patch:

Cascading replication feature for streaming log-based replication.
Standby servers can now have WALSender processes, which can work with
either WALReceiver or archive_commands to pass data. Fully updated
docs, including new conceptual terms of sending server, upstream and
downstream servers. WALSenders terminated when promote to master.
 
Fujii Masao, review, rework and doc rewrite by Simon Riggs

Streaming replication is relatively new, added in 9.0. Since beginning it shared the same limitation that normal WAL-files based replication has, that is – there is only one source of data. That is master.

While it makes sense, it is also pretty cool to be able to make slave source of replication for some other systems. For example – not to keep master occupied with such tasks.

Now, with the patch, we can setup replication schema like this:

So, let's test it.

To make it work I will need some master database and 3 slaves, made off the master. Seems simple enough.

=$ mkdir master
 
=$ initdb -D master
...
 
=$ vim master/postgresql.conf

In the postgresql.conf, I change:

  • port = 4001
  • wal_level = hot_standby
  • checkpoint_segments = 20
  • archive_mode = on
  • archive_command = ‘/bin/true'
  • max_wal_senders = 3
  • wal_keep_segments = 100
  • logging_collector = on
  • log_checkpoints = on
  • log_connections = on
  • log_line_prefix = ‘%m %r %u %d %p: ‘

I also set pg_hba.conf to something that matches my test environment:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   replication     all                                     trust
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust

With master prepared that way, I can start it:

=$ pg_ctl -D master start
server starting
 
=$ head -n 1 master/postmaster.pid | xargs -IPG ps uwf -p PG --ppid PG
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
depesz    9332  0.0  0.0  66456  7496 pts/3    S    12:11   0:00 /home/pgdba/work/bin/postgres -D master
depesz    9333  0.0  0.0  26140   724 ?        Ss   12:11   0:00  \_ postgres: logger process
depesz    9335  0.0  0.0  66456  1276 ?        Ss   12:11   0:00  \_ postgres: writer process
depesz    9336  0.0  0.0  66456  1012 ?        Ss   12:11   0:00  \_ postgres: wal writer process
depesz    9337  0.0  0.0  67164  2040 ?        Ss   12:11   0:00  \_ postgres: autovacuum launcher process
depesz    9338  0.0  0.0  26136   732 ?        Ss   12:11   0:00  \_ postgres: archiver process
depesz    9339  0.0  0.0  26136   932 ?        Ss   12:11   0:00  \_ postgres: stats collector process
 
=$ psql -p 4001 -d postgres -c "select version()"
                                                        version
------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.2devel on x86_64-unknown-linux-gnu, compiled by gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit
(1 row)

All looks ok.

One note – you might not understand why I used /bin/true as archive_command. Reason is very simple – archive has to be set to something, otherwise archive_mode cannot be enabled, and this will cause problems with backups, but on the other hand – I will not need to use the wal archive, since I have pretty large wal_keep_segments.

Now, we'll setup the slaves. Starting with the first one of course:

=$ psql -p 4001 -d postgres -c "select pg_start_backup('whatever')"
 pg_start_backup
-----------------
 0/2000020
(1 row)
 
=$ rsync -a master/ slave/
 
=$ psql -p 4001 -d postgres -c "select pg_stop_backup()"
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
 pg_stop_backup
----------------
 0/20000D8
(1 row)

Of course we need some tidying of the slave:

=$ rm -f slave/pg_xlog/???????????????????????? slave/pg_xlog/archive_status/* slave/pg_log/* slave/postmaster.pid
 
=$ vim slave/postgresql.conf

In the config, I change:

  • port = 4002
  • hot_standby = on

And I also create recovery.conf in slave/, with this content:

restore_command = '/bin/false'
standby_mode = 'on'
primary_conninfo = 'port=4001 user=depesz'
trigger_file = '/tmp/slave.finish.recovery'

And with this in place I can start slave:

=$ pg_ctl -D slave start
server starting
 
=$ head -n 1 slave/postmaster.pid | xargs -IPG ps uwf -p PG --ppid PG
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
depesz   13082  1.5  0.0  66484  7492 pts/3    S    12:51   0:00 /home/pgdba/work/bin/postgres -D slave
depesz   13083  0.0  0.0  26136   716 ?        Ss   12:51   0:00  \_ postgres: logger process
depesz   13084  0.0  0.0  66556  1428 ?        Ss   12:51   0:00  \_ postgres: startup process   recovering 000000010000000000000006
depesz   13087  2.7  0.0  81504  3064 ?        Ss   12:51   0:00  \_ postgres: wal receiver process   streaming 0/6000078
depesz   13091  0.0  0.0  66484  1012 ?        Ss   12:51   0:00  \_ postgres: writer process
depesz   13092  0.0  0.0  26132   896 ?        Ss   12:51   0:00  \_ postgres: stats collector process
 
=$ head -n 1 master/postmaster.pid | xargs -IPG ps uwf -p PG --ppid PG
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
depesz   12981  0.2  0.0  66456  7520 pts/3    S    12:50   0:00 /home/pgdba/work/bin/postgres -D master
depesz   12982  0.0  0.0  26140   724 ?        Ss   12:50   0:00  \_ postgres: logger process
depesz   12984  0.0  0.0  66456  1016 ?        Ss   12:50   0:00  \_ postgres: writer process
depesz   12985  0.0  0.0  66456  1012 ?        Ss   12:50   0:00  \_ postgres: wal writer process
depesz   12986  0.0  0.0  67296  2096 ?        Ss   12:50   0:00  \_ postgres: autovacuum launcher process
depesz   12987  0.0  0.0  26136   732 ?        Ss   12:50   0:00  \_ postgres: archiver process
depesz   12988  0.0  0.0  26136  1040 ?        Ss   12:50   0:00  \_ postgres: stats collector process
depesz   13088  0.3  0.0  67428  2480 ?        Ss   12:51   0:00  \_ postgres: wal sender process depesz [local] streaming 0/6000078

One note – I used “user=depesz" in primary_conninfo, because I run the tests on depesz system account, and initdb made superuser named depesz, and not postgres.

So, we now have replication master -> slave setup, so we can test it:

=$ psql -p 4001 -d postgres -c "create table i (x int4)"; psql -p 4002 -d postgres -c '\d i'
CREATE TABLE
       Table "public.i"
 Column |  Type   | Modifiers
--------+---------+-----------
 x      | integer |

All looks OK. Now, we can add slave2 and slave3. Since I'm lazy, I will just stop slave, copy slave to slave2/slave3 and then modify them:

=$ pg_ctl -D slave stop
waiting for server to shut down.... done
server stopped
 
=$ rsync -a slave/ slave2/
 
=$ rsync -a slave/ slave3/
 
=$ pg_ctl -D slave start
server starting

Slave2 and 3 will be basically the same as slave, but with different port, and connecting to 4002 (slave) instead of 4001 (master) for their WAL. So, let's do the changes:

=$ perl -pi -e 's/port = 4002/port = 4003/' slave2/postgresql.conf
 
=$ perl -pi -e 's/port = 4002/port = 4004/' slave3/postgresql.conf
 
=$ perl -pi -e 's/port=4001/port=4002/' slave{2,3}/recovery.conf
 
=$ perl -pi -e 's/slave.finish.recovery/slave2.finish.recovery/' slave2/recovery.conf
 
=$ perl -pi -e 's/slave.finish.recovery/slave3.finish.recovery/' slave3/recovery.conf

Now we have:

=$ ack "^port" */postgresql.conf
master/postgresql.conf
63:port = 4001                          # (change requires restart)
 
slave2/postgresql.conf
63:port = 4003                          # (change requires restart)
 
slave3/postgresql.conf
63:port = 4004                          # (change requires restart)
 
slave/postgresql.conf
63:port = 4002                          # (change requires restart)
 
=$ ack port */recovery.conf
slave2/recovery.conf
3:primary_conninfo = 'port=4002 user=depesz'
 
slave3/recovery.conf
3:primary_conninfo = 'port=4002 user=depesz'
 
slave/recovery.conf
3:primary_conninfo = 'port=4001 user=depesz'

So, let's start them:

=$ for a in slave2 slave3; do pg_ctl -D $a/ start; done
server starting
server starting

And now we see the processes:

=$ head -n 1 -q */*.pid | xargs -IPG echo "-p PG --ppid PG" | xargs ps uwf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
depesz   14031  0.0  0.0  66488  7496 pts/3    S    13:03   0:00 /home/pgdba/work/bin/postgres -D slave3
depesz   14032  0.0  0.0  26140   720 ?        Ss   13:03   0:00  \_ postgres: logger process
depesz   14033  0.0  0.0  66556  1400 ?        Ss   13:03   0:00  \_ postgres: startup process   recovering 000000010000000000000006
depesz   14063  0.0  0.0  79456  2148 ?        Ss   13:03   0:00  \_ postgres: wal receiver process   streaming 0/6012ED0
depesz   14069  0.0  0.0  66488  1532 ?        Ss   13:03   0:00  \_ postgres: writer process
depesz   14070  0.0  0.0  26136   900 ?        Ss   13:03   0:00  \_ postgres: stats collector process
depesz   14026  0.0  0.0  66492  7496 pts/3    S    13:03   0:00 /home/pgdba/work/bin/postgres -D slave2
depesz   14042  0.0  0.0  26144   720 ?        Ss   13:03   0:00  \_ postgres: logger process
depesz   14043  0.0  0.0  66560  1400 ?        Ss   13:03   0:00  \_ postgres: startup process   recovering 000000010000000000000006
depesz   14067  0.0  0.0  79460  2148 ?        Ss   13:03   0:00  \_ postgres: wal receiver process   streaming 0/6012ED0
depesz   14071  0.0  0.0  66492  1532 ?        Ss   13:03   0:00  \_ postgres: writer process
depesz   14072  0.0  0.0  26140   900 ?        Ss   13:03   0:00  \_ postgres: stats collector process
depesz   14021  0.0  0.0  66488  7528 pts/3    S    13:03   0:00 /home/pgdba/work/bin/postgres -D slave
depesz   14037  0.0  0.0  26140   724 ?        Ss   13:03   0:00  \_ postgres: logger process
depesz   14038  0.0  0.0  66560  1572 ?        Ss   13:03   0:00  \_ postgres: startup process   recovering 000000010000000000000006
depesz   14048  0.0  0.0  66488  1536 ?        Ss   13:03   0:00  \_ postgres: writer process
depesz   14050  0.0  0.0  26136   904 ?        Ss   13:03   0:00  \_ postgres: stats collector process
depesz   14052  0.0  0.0  79460  2136 ?        Ss   13:03   0:00  \_ postgres: wal receiver process   streaming 0/6012ED0
depesz   14064  0.0  0.0  67332  2476 ?        Ss   13:03   0:00  \_ postgres: wal sender process depesz [local] streaming 0/6012ED0
depesz   14068  0.0  0.0  67452  2476 ?        Ss   13:03   0:00  \_ postgres: wal sender process depesz [local] streaming 0/6012ED0
depesz   12981  0.0  0.0  66456  7524 pts/3    S    12:50   0:00 /home/pgdba/work/bin/postgres -D master
depesz   12982  0.0  0.0  26140   724 ?        Ss   12:50   0:00  \_ postgres: logger process
depesz   12984  0.0  0.0  66456  1780 ?        Ss   12:50   0:00  \_ postgres: writer process
depesz   12985  0.0  0.0  66456  1012 ?        Ss   12:50   0:00  \_ postgres: wal writer process
depesz   12986  0.0  0.0  67296  2156 ?        Ss   12:50   0:00  \_ postgres: autovacuum launcher process
depesz   12987  0.0  0.0  26136   732 ?        Ss   12:50   0:00  \_ postgres: archiver process
depesz   12988  0.0  0.0  26136  1040 ?        Ss   12:50   0:00  \_ postgres: stats collector process
depesz   14053  0.0  0.0  67444  2520 ?        Ss   13:03   0:00  \_ postgres: wal sender process depesz [local] streaming 0/6012ED0

Please note that master Pg has only one sender process (pid 14053), slave Pg has receiver (14052) and two senders (14064 and 14068), and slave2 and slave3 have only single receiver (14067 and 14063).

So, now we should test if it all works well, so:

=$ psql -d postgres -p 4001 -c 'insert into i(x) values (123)'
for port in 4002 4003 4004
do
    echo "port=$port"
    psql -p $port -d postgres -c "select * from i"
done
INSERT 0 1
port=4002
 x
---
(0 rows)
 
port=4003
 x
---
(0 rows)
 
port=4004
 x
---
(0 rows)

The tables are empty. They should have some data, but it might be simply because of replication lag. So let's retry the check, without insert now:

=$ for port in 4002 4003 4004
do
    echo "port=$port"
    psql -p $port -d postgres -c "select * from i"
done
port=4002
  x
-----
 123
(1 row)
 
port=4003
  x
-----
 123
(1 row)
 
port=4004
  x
-----
 123
(1 row)

And all works fine now. Great.

The only missing feature is ability to make slaves-off-slave still work when slave gets promoted to standalone, but unfortunately, it's not here:

=$ touch /tmp/slave.finish.recovery; sleep 5; head -n 1 -q */*.pid | xargs -IPG echo "-p PG --ppid PG" | xargs ps uwf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
depesz   14896  0.1  0.0  66488  7524 pts/3    S    13:18   0:00 /home/pgdba/work/bin/postgres -D slave3
depesz   14897  0.0  0.0  26140   720 ?        Ss   13:18   0:00  \_ postgres: logger process
depesz   14898  0.0  0.0  66556  1696 ?        Ss   13:18   0:00  \_ postgres: startup process   waiting for 000000010000000000000006
depesz   14901  0.0  0.0  66488  1276 ?        Ss   13:18   0:00  \_ postgres: writer process
depesz   14902  0.0  0.0  26136   900 ?        Ss   13:18   0:00  \_ postgres: stats collector process
depesz   14883  0.1  0.0  66492  7528 pts/3    S    13:18   0:00 /home/pgdba/work/bin/postgres -D slave2
depesz   14885  0.0  0.0  26144   724 ?        Ss   13:18   0:00  \_ postgres: logger process
depesz   14886  0.0  0.0  66560  1700 ?        Ss   13:18   0:00  \_ postgres: startup process   waiting for 000000010000000000000006
depesz   14890  0.0  0.0  66492  1280 ?        Ss   13:18   0:00  \_ postgres: writer process
depesz   14891  0.0  0.0  26140   904 ?        Ss   13:18   0:00  \_ postgres: stats collector process
depesz   14021  0.0  0.0  66488  7528 pts/3    S    13:03   0:00 /home/pgdba/work/bin/postgres -D slave
depesz   14037  0.0  0.0  26140   724 ?        Ss   13:03   0:00  \_ postgres: logger process
depesz   14048  0.0  0.0  66488  1780 ?        Ss   13:03   0:00  \_ postgres: writer process
depesz   14050  0.0  0.0  26136  1032 ?        Ss   13:03   0:00  \_ postgres: stats collector process
depesz   15018  0.0  0.0  66488  1016 ?        Ss   13:20   0:00  \_ postgres: wal writer process
depesz   15019  0.0  0.0  67320  2100 ?        Ss   13:20   0:00  \_ postgres: autovacuum launcher process
depesz   15020  0.0  0.0  26136   912 ?        Ss   13:20   0:00  \_ postgres: archiver process   last was 00000002.history
depesz   12981  0.0  0.0  66456  7524 pts/3    S    12:50   0:00 /home/pgdba/work/bin/postgres -D master
depesz   12982  0.0  0.0  26140   724 ?        Ss   12:50   0:00  \_ postgres: logger process
depesz   12984  0.0  0.0  66456  1780 ?        Ss   12:50   0:00  \_ postgres: writer process
depesz   12985  0.0  0.0  66456  1012 ?        Ss   12:50   0:00  \_ postgres: wal writer process
depesz   12986  0.0  0.0  67296  2164 ?        Ss   12:50   0:00  \_ postgres: autovacuum launcher process
depesz   12987  0.0  0.0  26136   732 ?        Ss   12:50   0:00  \_ postgres: archiver process
depesz   12988  0.0  0.0  26136  1040 ?        Ss   12:50   0:00  \_ postgres: stats collector process

As you can see the sender in slave got killed, and thus slave2 and slave3 are still slaves, but without source of WAL.

Logs of slave2 and slave3 PostgreSQL show clear reason why it doesn't work:

2011-07-26 13:26:41.483 CEST    16318: FATAL:  timeline 2 of the primary does not match recovery target timeline 1

Clearly – slave is using timeline == 2, while slave2 and slave3 are still on timeline == 1.

Theoretically it should be simple to fix, since slave has pg_xlog/00000002.history file, but the functionality to switch timelines in recovery is simply not there yet.

Anyway – ability to have slaves that are receiving WAL from other slaves, is pretty cool, and definitely a welcome addition.

  1. One comment

  2. # ioguix
    Jul 28, 2011

    Hey,

    I did discuss the “swithing timeline” issue with Masao-san during last pgcon. I’m not sure, but I think you just cannot switch timeline using streaming replication, but you can do it from archives.

    Maybe you can setup log shipping from slave1 to slave2 and 3, and add in the recovery.conf:
    recovery_target_timeline = ‘latest’

    Then restart slave2 and 3 after the slave1 promotion.

Leave a comment