Waiting for 9.3 – Allow a streaming replication standby to follow a timeline switch.

On 13th of December, Heikki Linnakangas committed patch:

Allow a streaming replication standby to follow a timeline switch.
 
Before this patch, streaming replication would refuse to start replicating
if the timeline in the primary doesn't exactly match the standby. The
situation where it doesn't match is when you have a master, and two
standbys, and you promote one of the standbys to become new master.
Promoting bumps up the timeline ID, and after that bump, the other standby
would refuse to continue.
 
There's significantly more timeline related logic in streaming replication
now. First of all, when a standby connects to primary, it will ask the
primary for any timeline history files that are missing from the standby.
The missing files are sent using a new replication command TIMELINE_HISTORY,
and stored in standby's pg_xlog directory. Using the timeline history files,
the standby can follow the latest timeline present in the primary
(recovery_target_timeline='latest'), just as it can follow new timelines
appearing in an archive directory.
 
START_REPLICATION now takes a TIMELINE parameter, to specify exactly which
timeline to stream WAL from. This allows the standby to request the primary
to send over WAL that precedes the promotion. The replication protocol is
changed slightly (in a backwards-compatible way although there's little hope
of streaming replication working across major versions anyway), to allow
replication to stop when the end of timeline reached, putting the walsender
back into accepting a replication command.
 
Many thanks to Amit Kapila for testing and reviewing various versions of
this patch.

I actually had a problem understanding what this patch is about. After all – following timeline change is possible for quite some time now. I did it in 9.1 or 9.2. Phil did it of course too. So what is the idea about this newest patch?

Well. Apparently – the idea is that now the timeline switch can happen when working with only streaming replication – without WAL archive at all.

This is arguably not so good idea (that is: to run without WAL archive), but it's definitely possible, so let's test is.

Here goes a little digression – just when this patch was committed, I had actually quite a lot of problems to get it working. Luckily Heikki was very helpful, and pushed some fixes. Finally, I tested the state as of this commit.

To test it, I did setup three virtual machines:

  • ubuntu1 – ip: 172.28.173.140 – master
  • ubuntu2 – ip: 172.28.173.141 – slave
  • ubuntu3 – ip: 172.28.173.142 – slave

Master has also floating IP used for DB communication – 172.28.173.253. This IP will be taken over by slave (ubuntu2) in my test.

On all machines I have PostgreSQL 9.3 setup, and replication is done from ubuntu1 to ubuntu2, and from ubuntu1 to ubuntu3, but ubuntu2/3, to connect to master, use not .140 IP, but rather the floating IP .253.

On ubuntu2 (slave):

=$ pid=$(head -n1 data/postmaster.pid); ps uwf -p $pid --ppid $pid
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pgdba     1300  0.0  0.6 170112 12812 pts/0    S    12:28   0:00 /home/pgdba/work/bin/postgres
pgdba     1304  0.0  0.0  23604   788 ?        Ss   12:28   0:00  \_ postgres: logger process     
pgdba     1305  0.0  0.0 170220  1536 ?        Ss   12:28   0:00  \_ postgres: startup process   recovering 000000010000000000000010
pgdba     1306  0.4  0.1 184160  2656 ?        Ss   12:28   0:00  \_ postgres: wal receiver process   streaming 0/11000000
pgdba     1307  0.0  0.0 170112   952 ?        Ss   12:28   0:00  \_ postgres: checkpointer process   
pgdba     1308  0.0  0.0 170112  1484 ?        Ss   12:28   0:00  \_ postgres: writer process     
pgdba     1309  0.0  0.0  25700   864 ?        Ss   12:28   0:00  \_ postgres: stats collector process   
 
=$ cat data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
trigger_file = '/tmp/finish.replication'
recovery_target_timeline='latest'

On ubuntu3 (slave):

=$ pid=$(head -n1 data/postmaster.pid); ps uwf -p $pid --ppid $pid
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pgdba     1322  0.0  0.6 170108 12824 pts/0    S    12:28   0:00 /home/pgdba/work/bin/postgres
pgdba     1326  0.0  0.0  23600   800 ?        Ss   12:28   0:00  \_ postgres: logger process     
pgdba     1327  0.0  0.0 170216  1544 ?        Ss   12:28   0:00  \_ postgres: startup process   recovering 000000010000000000000010
pgdba     1328  0.3  0.1 184156  2716 ?        Ss   12:28   0:00  \_ postgres: wal receiver process   streaming 0/11000000
pgdba     1329  0.0  0.0 170108   960 ?        Ss   12:28   0:00  \_ postgres: checkpointer process   
pgdba     1330  0.0  0.0 170108  1492 ?        Ss   12:28   0:00  \_ postgres: writer process     
pgdba     1331  0.0  0.0  25696   872 ?        Ss   12:28   0:00  \_ postgres: stats collector process   
 
=$ cat data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
trigger_file = '/tmp/finish.replication'
recovery_target_timeline='latest'

As you can see – both are configured the same way, are connecting to special, floating IP.

On ubuntu1 (master) I can see:

=$ pid=$(head -n1 DATA/postmaster.pid); ps uwf -p $pid --ppid $pid
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND                                                                                                                                                                                                          
pgdba     1115  0.0  0.6 170112 12884 pts/0    S    12:25   0:00 /home/pgdba/WORK/bin/postgres                                                                                                                                                                                    
pgdba     1119  0.0  0.0  23604   804 ?        Ss   12:25   0:00  \_ postgres: logger process                                                                                                                                                                                     
pgdba     1121  0.0  0.1 170220  2380 ?        Ss   12:25   0:00  \_ postgres: checkpointer process                                                                                                                                                                               
pgdba     1122  0.0  0.0 170112  1760 ?        Ss   12:25   0:00  \_ postgres: writer process     
pgdba     1123  0.0  0.0 170112   968 ?        Ss   12:25   0:00  \_ postgres: wal writer process   
pgdba     1124  0.0  0.1 170976  2268 ?        Ss   12:25   0:00  \_ postgres: autovacuum launcher process   
pgdba     1125  0.0  0.0  25700   852 ?        Ss   12:25   0:00  \_ postgres: archiver process   LAST was 000000010000000000000010
pgdba     1126  0.0  0.0  25832  1044 ?        Ss   12:25   0:00  \_ postgres: stats collector process   
pgdba     1662  0.0  0.1 171328  2824 ?        Ss   12:28   0:00  \_ postgres: wal sender process replication 172.28.173.141(45099) streaming 0/11000000
pgdba     1663  0.0  0.1 171196  2824 ?        Ss   12:28   0:00  \_ postgres: wal sender process replication 172.28.173.142(45750) streaming 0/11000000
 
15:45:59 pgdba@ubuntu1 ~
=$ ifconfig | grep -A1 ^eth
eth0      Link encap:Ethernet  HWaddr 08:00:27:07:98:05
          inet addr:172.28.173.140  Bcast:172.28.173.255  Mask:255.255.255.0
--
eth0:0    Link encap:Ethernet  HWaddr 08:00:27:07:98:05
          inet addr:172.28.173.253  Bcast:172.28.173.255  Mask:255.255.255.0

Now. I want to failover to ubuntu2. Ubuntu1 will be basically discarded. So I do:

  • on Ubuntu1:
    • pg_ctl -D data/ -m fast stop
    • sudo ifdown eth0:0
  • on Ubuntu2:
    • touch /tmp/finish.replication
    • sudo ifup eth0:0
  • And – after a while – ubuntu3 does reconnect to new master (ubuntu2) and continues with replication. All works.

    I also did test it in case where Ubuntu3 was lagging behind replication – after switch it was able to catchup and replicate just fine. Of course – to make this work you have to have sensibly high wal_keep_segments GUC.

    Great work, thanks Heikki, Amit and Thom.

One thought on “Waiting for 9.3 – Allow a streaming replication standby to follow a timeline switch.”

Comments are closed.