The BIG feature. The feature that made PostgreSQL leap from 8.4 to 9.0. Patch was written by Fujii Masao, and committed by Heikki Linnakangas on 15th of January 2010:
Log Message: ----------- Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me
What exactly does it mean?
Well. As you perhaps are aware, PostgreSQL offers replication using WAL segments – which, till 8.4, could be used to have Warm Standby, but from 9.0 you will be able to make it actually Hot Standby.
Problem with this replication is pretty simple – replication is based on files, which get rotated (and thus replication) after 16MB of data.
In some scenarios it's not a problem – if you're generating constantly at least 16MB of writes per minute – your replication lag is basically always below one minute.
But what happens if you don't have that many writes? Of course you can force WAL rotation, but this is not really good idea, and it's not suggested to force it below one minute anyway.
But now, thanks to Streaming Replication, lag should be nearly zero. Of course it's still asynchronous replication, but for most usecases – async is really good enough.
So, let's see how to set it up.
On my test machine, I have PostgreSQL binaries in /home/pgdba/work, and main datadir in /home/pgdba/data (listens on port 5850)
Now. let's setup streaming replication and standby server, on the same machine in /home/pgdba/data2, listening on 5851.
First, I will need to stop main PostgreSQL:
/home/pgdba/work/bin/pg_ctl -D /home/pgdba/data stop
Now let's make some changes in config. Before applying the changes, I copied postgresql.conf to postgresql.conf.before, so I can now show you simple diff:
=> diff -u postgresql.conf.before postgresql.conf --- postgresql.conf.before 2010-01-31 23:27:24.000000000 +0100 +++ postgresql.conf 2010-01-31 23:29:29.000000000 +0100 @@ -175,9 +175,9 @@ # - Archiving - -#archive_mode = off # allows archiving to be done +archive_mode = on # allows archiving to be done # (change requires restart) -#archive_command = '' # command to use to archive a logfile segment +archive_command = 'cp "%p" /home/pgdba/wal_archive/"%f"' # command to use to archive a logfile segment #archive_timeout = 0 # force a logfile segment switch after this # number of seconds; 0 disables @@ -190,7 +190,7 @@ # - Replication - -#max_wal_senders = 0 # max number of walsender processes +max_wal_senders = 1 # max number of walsender processes #wal_sender_delay = 200ms # 1-10000 milliseconds
As you can see it's not really complicated.
Now, I can start master PostgreSQL:
/home/pgdba/work/bin/pg_ctl -D /home/pgdba/data start
OK. Now, I generated some traffic in database, to make it pre-generate some wal-segments. Not much – let's say 100.
My test script generates around 7 wal segments per minute.
Now. Having this, let's setup slave server:
=> /home/pgdba/work/bin/psql -U postgres -p 5850 -c "SELECT pg_start_backup('depesz')" pg_start_backup ----------------- 0/74C645E8 (1 row) => rsync -a /home/pgdba/data/ /home/pgdba/data2/ => /home/pgdba/work/bin/psql -U postgres -p 5850 -c "SELECT pg_stop_backup()" pg_stop_backup ---------------- 0/7A8FB9A8 (1 row)
Now, let's clear data2:
rm -f /home/pgdba/data2/postmaster.pid /home/pgdba/data2/pg_xlog/archive_status/* /home/pgdba/data2/pg_xlog/0*
And change config. Diff between data and data2:
=> diff -u data/postgresql.conf data2/postgresql.conf --- data/postgresql.conf 2010-01-31 23:34:11.000000000 +0100 +++ data2/postgresql.conf 2010-02-01 00:02:18.000000000 +0100 @@ -60,7 +60,7 @@ # comma-separated list of addresses; # defaults to 'localhost', '*' = all # (change requires restart) -#port = 5850 # (change requires restart) +port = 5851 # (change requires restart) max_connections = 100 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). @@ -175,7 +175,7 @@ # - Archiving - -archive_mode = on # allows archiving to be done +archive_mode = off # allows archiving to be done # (change requires restart) archive_command = 'cp "%p" /home/pgdba/wal_archive/"%f"' # command to use to archive a logfile segment #archive_timeout = 0 # force a logfile segment switch after this @@ -190,7 +190,7 @@ # - Replication - -max_wal_senders = 1 # max number of walsender processes +max_wal_senders = 0 # max number of walsender processes #wal_sender_delay = 200ms # 1-10000 milliseconds
On slave I created recovery.conf:
=> cat /home/pgdba/data2/recovery.conf standby_mode = 'on' primary_conninfo = 'host=127.0.0.1 port=5850 user=postgres' trigger_file = '/home/pgdba/data2/finish.replication' restore_command = 'cp -i /home/pgdba/wal_archive/%f "%p" </dev/null'
It is critical that you can't use pg_standby or anything like this that will wait for next wal segment.
Also – as far as I understand – wal segments should be sent directly from master to slave using archive_command – i.e. copying them to some temp directory, and then rsyncing from cron, will probably not work.
Anyway. After some time, slave system fetched all wal segments from archive directory, and switched to streaming.
It can be seen because now we have 2 new, interesting processes:
=> ps uw --ppid $( head -n 1 /home/pgdba/data2/postmaster.pid ) USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 13384 0.0 0.0 28712 1056 ? Ss 00:22 0:00 postgres: logger process pgdba 13385 1.1 0.2 59916 27200 ? Ss 00:22 0:03 postgres: startup process recovering 0000000100000000000000EE pgdba 13390 0.0 0.2 59812 26208 ? Ss 00:22 0:00 postgres: writer process pgdba 13397 0.0 0.0 30800 1052 ? Ss 00:22 0:00 postgres: stats collector process pgdba 13402 0.4 0.0 77016 7368 ? Ss 00:22 0:01 postgres: wal receiver process streaming 0/EE2B7FD8 => ps uw --ppid $( head -n 1 /home/pgdba/data/postmaster.pid ) USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND pgdba 6904 0.9 0.0 28716 1056 ? Ss Jan31 0:31 postgres: logger process pgdba 6906 0.1 0.2 60092 27060 ? Ss Jan31 0:04 postgres: writer process pgdba 6907 0.0 0.0 59652 1304 ? Ss Jan31 0:00 postgres: wal writer process pgdba 6908 0.0 0.0 62624 3992 ? Ss Jan31 0:00 postgres: autovacuum launcher process pgdba 6909 0.0 0.0 30912 1024 ? Ss Jan31 0:00 postgres: archiver process last was 0000000100000000000000ED pgdba 6910 0.0 0.1 47356 12896 ? Ss Jan31 0:00 postgres: stats collector process pgdba 8713 85.6 0.6 118308 80628 ? Rs Jan31 31:35 postgres: depesz depesz [local] DROP TABLE pgdba 8818 0.0 0.5 110428 66604 ? Ss Jan31 0:01 postgres: autovacuum worker process depesz pgdba 8887 0.0 0.6 134780 84884 ? Ss Jan31 0:01 postgres: autovacuum worker process depesz pgdba 8952 0.0 0.6 126812 83580 ? Ss Jan31 0:01 postgres: autovacuum worker process depesz pgdba 13403 1.4 0.0 69224 8580 ? Ss 00:22 0:04 postgres: wal sender process postgres 127.0.0.1(52186) streaming 0/EE3D8090
As you can see master is still doing archivization of wal segments – which should go on, but you should use some cronjob to remove no-longer-needed wal segments. By no-longer-needed I mean – older than “Latest checkpoint's REDO location" from “pg_controldata SLAVE_DATA_DIR/".
Now. let's see how it really works.
I stopped my load generator – thank to this there will be basically no traffic on database – as soon as autovacuums will finish 🙂
Afterwards, I did simple test. I have table:
# \d test Table "public.test" Column | Type | Modifiers --------+--------------------------+----------- x | timestamp with time zone | not null Indexes: "test_pkey" PRIMARY KEY, btree (x)
=$ /home/pgdba/work/bin/psql -U depesz -p 5850 -c "insert into test (x) values (now()) returning x" x ------------------------------- 2010-02-01 00:33:29.543639+01 (1 row) INSERT 0 1 =$ /home/pgdba/work/bin/psql -U depesz -p 5851 -c "select * from test" x --- (0 rows)
Of course I don't see it – streaming replication is fast, but not that fast 🙂
But adding even 1 second of delay worked perfectly:
=$ /home/pgdba/work/bin/psql -U depesz -p 5850 -c "insert into test (x) values (now()) returning x" x ------------------------------- 2010-02-01 00:35:13.520617+01 (1 row) INSERT 0 1 =$ sleep 1 =$ /home/pgdba/work/bin/psql -U depesz -p 5851 -c "select * from test order by x desc limit 1" x ------------------------------- 2010-02-01 00:35:13.520617+01 (1 row)
(These command have been copy-pasted to terminal, so there was no typing-induced delay).
So, as we can see it works.
Is it ready? Definitely no. For starters – docs are far from being usable. If you'd like to play with it yourself – you can check docs but you should check Wiki page.
Do I like it? YEAH! Even with not ready docs, setup was not really long, and effect is well worth some sweat 🙂
18 thoughts on “Waiting for 9.0 – Streaming replication”
Another fantastic post! Thanks for making this demo 🙂
Nice article – hot standby is a much needed feature in Postgresql. One thing you might want to consider is using rsync in place of cp. rsync does an atomic copy, so there’s no chance of the other process picking up a partially written file. I use something like this:
archive_command = ‘rsync %p /var/lib/pgsql/archives/%f
Awesome news! Thanks for the write-up.
This feature alone is why I plan on testing 9.0 as early as possible so that I can start using this wonderful feature. My thanks to everyone involved!
Does anyone know how this exactly works? Does it still apply blocks (like a warm standby) or does it work on query level (like an Oracle logical standby database)?
It is still wal replication. But instead of waiting for 16MB of data to be send, data are being sent more or less in real time.
I’m so disappointed about the miss of a TRANSPARENT Table Partitioning in PG 9.0. 🙁
Table Partitioning is great to speed tunning large databases.
Nice demo on Hotstandby. Any strong reason why we not using the pg_standby command here.
Currently I use PostgreSql 8.2 together with slony-I for partial replication of a set of tables at the master to the database of the slave.
Is it possible to use this type of replication built in PostgreSql 9 for partial replication too?
no. whole WAL is being sent to slave, which means all databases, and all tables from your pg instance.
Hi, Nice article.I tried hot standby and streaming replication. That’s nice feature but lacks one thing dumping of data (pg_dump) from standby server. Is there any option to perform pg_dump on streaming server.
Can you let me know the contents of trigger file
it’s just *trigger* file – it’s content is not important.
How fast must be backup server?
Can we make backup from host with powerful disk subsystem to slave with regular SATA disk, for example?
@Edo: usually it will work. check for yourself, though, if you want definitive answer.
Any one know, how to change slave as a master when master server get down or some time fail case.
Please share possibilities, it more helpful.
I have translated this post into Spanish.
You can change slave to master using trigger file home/pgdba/data2/finish.replication
If you create file home/pgdba/data2/finish.replication, then slave is converted to master.
You can see more information here: https://www.postgresql.org/docs/current/static/warm-standby-failover.html
Comments are closed.