Today, I released new version of OmniPITR – 0.5.0.
This new version has one important new feature – which is so called “direct destination" for backups.
What it means? What it does? How it helps? Let's see…
Let's assume you have remote destination for backups, something like:
$ omnipitr-backup-master ... -dr gzip=storage.host:/path/to/store/backups ...
Up to 0.4.0, OmniPITR did (it's a simplification, but good enough for this example):
- $ tar czf /tmp/data.tar.gz $PGDATA
- $ scp /tmp/data.tar.gz storage.host:/path/to/store/backups/
- $ rm /tmp/data.tar.gz
- $ tar czf /tmp/xlogs.tar.gz $XLOGS-DIR
- $ scp /tmp/xlog.tar.gz storage.host:/path/to/store/backups/
- $ rm /tmp/xlog.tar.gz
This is all fine, and it's pretty standard, but it's not optimal. Why? For starters – it causes more disk I/O – we read data, and store it locally as tarball. And then we re-read the tarball to send it to remote machine.
It also causes peak in network usage – scp (or whatever is actually used) will use all the bandwidth available.
And, at the end – we have to remove two files – which can be pretty big (think hundreds of gigabytes), and rm of them, on ext3, can be pretty painful.
So, what is the solution?
It's simple, instead of running it the way I showed, why not: run tar, output from tar direct to ssh, which connects to storage.host, and then stores the file in final place? Something like:
$ tar czf - $PGDATA | ssh storage.host 'cat - > /path/to/store/backups/data.tar.gz'
In this way we avoid writing and rereading tarball on database server. And also – since transfer of tarball is done in parallel to its creation – it is (usually) limited by tar/gzip speed, which makes for slower transfer. Despite slower transfer – we get the backup earlier on backup server, because transfer started earlier.
So – it's all win.
Of course writing it like I shown above seems trivial, until you'll consider that omnipitr is very configurable. So you can have multiple destinations. Multiple compression schemata. And multiple checksums generated.
Long story short – omnipitr-backup-master (and -slave too, of course) can do direct destinations, regardless of complexity, but to do so, it needs (new requirement) bash. And the actual command that gets executed can sometimes be very unreadable. But it's ok – it's generated programmatically, so it doesn't have to be read
As a side effect of the changes, all local destinations are now processed in parallel (as are all direct destinations) while creating tarball.
Finally – thanks for the idea, and prodding to Gary of justin.tv.