Some time ago I wrote a piece on speeding up dump/restore process using custom solution that was parallelizing process.
Later on I wrote some tools (“fast dump and restore") to do it in more general way.
But all of them had a problem – to get consistent dump you need to stop all concurrent access to your database. Why, and how to get rid of this limitation?
First why. It's actually pretty simple. Let's assume you have two tables:
CREATE TABLE users (id serial PRIMARY KEY, ...); CREATE TABLE sessions (id serial PRIMARY KEY, user_id int4 REFERENCES users (id), ...);
Now. Someone registers new account, and immediately logs in.
Registration (insert into users) and login (insert into sessions) happen in two separate transactions.
But, concurrently running parallel pg_dump, could have started dump of “users" before the user was inserted (so the data dump will not have it), but sessions – afterwards. So in sessions data, you will get session for user that doesn't exist in user data.
In normal pg_dump it doesn't happen, because it runs all data fetchers in single transaction (in repeatable read transaction isolation mode). But parallel dumps – dump each block of data in separate connection – so it's separate transaction.
In 2011, Tom Lane committed very interesting patch. I didn't write about it in “Waiting for …" series, mostly because I assumed that soon there will be patch that adds parallelization to pg_dump, based on the patch that Tom committed.
Apparently – we still don't have parallel pg_dump. But we have the ability to write it by ourselves.
How? That's simple. First, before anything else will happen, we start first transaction, and do it that way:
$ BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN *$ SELECT pg_export_snapshot(); pg_export_snapshot -------------------- 00000AAF-1 (1 ROW)
of course the value returned by pg_export_snapshot() will be different, but that's not important.
Now, knowing this snapshot number/id, we start a number of connections that will be used for our dumping work, and in each of them, we need to run:
$ BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; BEGIN *$ SET TRANSACTION SNAPSHOT '00000AAF-1'; SET
It is important that the “SET TRANSACTION SNAPSHOT" has to be run as first command in transaction.
Thanks to this, our new, just started transaction, will have the same data visibility as older, started previously, transaction. Thus providing us with many backends, all sharing the same view of data.
The original transaction cannot be closed until all work that needs to be done using given snapshot will be finished. But – it can be used to run data dumps just as fine as any other connection. It just has to be closed last.
So the idea is that to make data dump in parallel, you'd:
- start single connection (let's call it master)
- start new transaction in master connection, and export snapshot
- start any number of connections (workers), and in each of them – start transaction, and switch to masters transaction snapshot
- run in all connections appropriate commands (COPY or SLECT basically)
- once all work is done, close worker transactions and connections
- close master transaction and connection
Generated dump will be consistent.
of course you will also need to dump schema, and find a way to load the data to some new database, but that's irrelevant for now – methods for this I described, and even wrote tools for.
All in all – while we still don't have parallel pg_dump, we now have the ability to run parallel data copy that will be “as good".
Very simplistic “pg_dump" using snapshot cloning can look like this:
#!/usr/bin/env bash # configuration export PGPORT=5920 export PGDATABASE=rt4 # Temporary directory for small helper files tmp_dir="$( mktemp -d )" trap 'rm -rf "$tmp_dir"' EXIT # Run master psql exec 50> >( exec psql -qAtX ) master_pid="$!" # Start transaction printf 'BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\n' >&50 # Get transaction snapshot id printf '\o %s/snapshot.id\n' "$tmp_dir" >&50 printf 'SELECT pg_export_snapshot();\n' >&50 # Get list of tables to dump printf '\o %s/tables.list\n' "$tmp_dir" >&50 printf 'SELECT c.oid::regclass FROM pg_class c join pg_namespace n on c.relnamespace = n.oid WHERE c.relkind = $$r$$ and n.nspname !~ $$^(pg_|information_schema)$$ ORDER BY pg_table_size(c.oid) desc;\n' >&50 # Create file that marks that all is done in master printf '\o %s/marker.file\n' "$tmp_dir" >&50 printf 'SELECT 1;\n' >&50 printf '\o\n' >&50 # Wait for marker file to appear while true do if [[ -s "$tmp_dir/marker.file" ]] then break fi sleep 0.1 done # Get snapshot id to variable snapshot_id="$( < "$tmp_dir/snapshot.id" )" # Dump each table separately while read table_name do printf "BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;\nSET TRANSACTION SNAPSHOT '%s';\nCOPY %s TO STDOUT;\n" "$snapshot_id" "$table_name" | psql -qAtX > "$table_name.dump" & done < "$tmp_dir/tables.list" # wait for all dumps to finish wait echo "All done."
It is not very smart – it does every table separetely, but doesn't split tables into many parts, so if you have one table that is 95% of your database – the speedup will be rather small. But still – it's just proof of concept.