January 24th, 2011 by depesz | Tags: , , , , , , | 3 comments »
Did it help? If yes - maybe you can help me?

On 23rd of January, Magnus Hagander committed patch which adds:

Add pg_basebackup tool for streaming base backups
This tool makes it possible to do the pg_start_backup/
copy files/pg_stop_backup step in a single command.
There are still some steps to be done before this is a
complete backup solution, such as the ability to stream
the required WAL logs, but it's still usable, and
could do with some buildfarm coverage.
In passing, make the checkpoint request optionally
fast instead of hardcoding it.
Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine

It looks like it will provide a way to setup WAL slave without using any other protocol than PostgreSQL connections themselves.

In this regard it's a bit like Dimitris earlier project – pg_basebackup.

But – previous project used PL/PythonU to read the files. This time, the code uses additional logic in PostgreSQL itself (in WAL Sender as far as I recall). So let's see how it works.

I did setup normal 9.1 instance, with default parameters, and later changed following GUCs, to make the hot-standby work:

wal_level = hot_standby
max_wal_senders = 5
archive_mode = on
archive_command = '/bin/true'
wal_keep_segments = 300

Note – I technically don't need archive_mode, as I'm not going to use wal archiving, but it makes pg_basebackup warn. In case you wonder – why it warns – well, you should have archive_mode or some non-zero wal_keep_segments, but using archiving is safer, and it's possible to setup compressed archive. On the other hand – this is just test setup, so I don't care about archive itself, just trying to suppress the warning.

Then, I created new role in master postgres, so that I can run basebackup and replication on it, and not on superuser:

# create user replica with replication password 'secret';

(I also added “host replication replica trust" to pg_hba.conf).

With this setup, I did:

=$ rm -rf new-data-dir/; pg_basebackup -D new-data-dir -Fp -c fast -l basebackuptest -v -P -U replica -h -p 5910
47008/47008 kB (100%) 1/1 tablespaces (                              )011-01-24_142205.csv)
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
pg_basebackup: base backup completed.

Which looks pretty cool.

Afterwards new-data-dir contained full copy of my normal data dir, including (unfortunately) pg_log/ content, but excluding (and it's good) pg_xlog content.

As for the second part – pg_xlog content – it was also missing pg_xlog/archive_status directory which can be problematic in future if we'd failover to the new server.

But anyway – let's change the port in slave system, make it hot standby, and start it.

First, in new-data-dir/postgresq.conf:

hot_standby = on

and in new-data-dir/recovery.conf:

standby_mode = 'on'
primary_conninfo = 'host= port=5910 user=replica password=secret'
restore_command = '/bin/false'
trigger_file = '/tmp/finish.recovery'

and now, I can start it:

=$ pg_ctl -D new-data-dir/ start
server starting

And with this I can check if the replication is working on the base backup:

=$ psql -U pgdba -p 5910 -c 'select * from test order by id desc limit 2';
 id  |         inserted_when         | inserted_by 
 170 | 2011-01-24 14:51:42.313384+01 |        5674
 169 | 2011-01-24 14:51:12.29726+01  |        5654
(2 rows)
=$ psql -U pgdba -p 5911 -c 'select * from test order by id desc limit 2';
 id  |         inserted_when         | inserted_by 
 170 | 2011-01-24 14:51:42.313384+01 |        5674
 169 | 2011-01-24 14:51:12.29726+01  |        5654
(2 rows)

OK. All works, seems to be good.

And I did it without any kind of ssh/rsync stuff.

The obvious question now is – what is the overhead of sending data over PostgreSQL communication channel/protocol.

So, I closed all applications that could connect to this database. Run vacuum and analyze on all databases, to avoid any additional writes, and move logs outside of $PGDATA.

This should make the datadir as static as possible without actually turning off PostgreSQL.

Afterwards, I made first pg_basebackup, to get baseline:

pg_basebackup -D baseline-copy -Fp -c fast -l basebackuptest -v -P -U replica -h -p 5910

With this in place, I checked size of the “baseline-copy", and some tar/gz versions of it:

=$ GZIP="-9" tar czf baseline-copy-9.tar.gz baseline-copy/
=$ GZIP="-1" tar czf baseline-copy-1.tar.gz baseline-copy/
=$ tar czf baseline-copy.tar.gz baseline-copy/
=$ tar cf baseline-copy.tar baseline-copy/
=$ du -sk baseline-copy*
47396   baseline-copy
6392    baseline-copy-1.tar.gz
5488    baseline-copy-9.tar.gz
48092   baseline-copy.tar
5580    baseline-copy.tar.gz

OK. Now we need to measure how much data is being actually sent through Pg connection when doing the backup.

Well, for me the easier way is to use strace, and then just parse it's output.

So, I used this one-liner:

rm -rf test-copy{,.log}; strace -e connect,recvfrom,sendto,close -f -o test-copy.log pg_basebackup -D test-copy -Fp -l basebackuptest -U replica -h -p 5910 &> /dev/null

and then this one to get summaries:

cat test-copy.log | perl -ne 'next unless /^\d+\s+connect\(3, .*5910/../^\d+\s+close\(3\)/; print if s/^\d+\s+(recvfrom|sendto)\(3,.* = (\d+)$/$1 $2/' | awk '{S[$1]+=$2} END{for (i in S) printf "%-15s : %d\n", i, S[i]/1024}'

(if you want to say that it's ugly – don't waste your breath/keyboard – I know. But it's just test situation. And I checked that it's filehandle 3.)

So. Results for just sending the backup without any compression, in plain format:

sendto          : 0
recvfrom        : 48201

Which means that virtually no data was transferred back to master (which is just a sanity check), and total size of data transferred from master to slave is slighly more than size of datadir. But not much more – just 805kB. Which is 1.7% of PGDATA size.

Let's see the same thing with compression:

First try, and I got fail:

=$ pg_basebackup -D test-copy -Fp -l basebackuptest -U replica -h -Z 6 -p 5910
pg_basebackup: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

Hmm .. OK. Let's play with tar backups then:

=$ pg_basebackup -D test-copy -Ft -l basebackuptest -U replica -h -Z 6 -p 5910

Afterwards, in test-copy, I found base.tar.gz file, which was 5560 kB in size. Strace showed, that this time size of transmission on the net was … 48201 kB.

Which makes some sense. Some in terms – compression is clearly done on the receiving end, and master is not being overloaded with this task. On the other hand – if your $PGDATA is large, you might still to prefer to make your base backups using normal means (tar/omnipitr-backup-master/rsync), to preserve network bandwidth.

All in all – it looks like a definitely cool tool, and I thank authors for it.

  1. 3 comments

  2. Jan 24, 2011

    Nice writeup!

    Just a FYI – compression is something we discussed, and something we eventually want to add (as an option, of course), but given how little time there is left before 9.1 freezes, it was pushed back. But it’s definitely on the radar for the future!

  3. Jan 24, 2011

    that’s great. One question though – when done, will it be done by calls to zlib library, or external program? reason for question – i usually have lots of cpu power left, while io and network are much more problematic. and with ability to define my own compressor program, I would be able (for example) to use pigz (http://www.zlib.net/pigz/)

  4. # Mario
    May 4, 2011

    Or pbzip2 that i’m using quite often.

Leave a comment