On 2nd of February Andrew Dunstan committed his patch (with editing by Tom Lane) that:
Log Message: ----------- Provide for parallel restoration from a custom format archive. Each data and post-data step is run in a separate worker child (a thread on Windows, a child process elsewhere) up to the concurrent number specified by the new pg_restore command-line --multi-thread | -m switch.
Well, the basic idea is simple: loading using multiple threads will (in theory) make restoration faster.
Parallel restoration is available only when dealing with -Fc dumps (custom format).
So, let's test how it works.
First, I created tables, and put some rows in them:
# CREATE TABLE t1 (i INT4); CREATE TABLE TIME: 56.411 ms # INSERT INTO t1 (i) SELECT generate_series(1,10000000); INSERT 0 10000000 TIME: 34922.493 ms # ALTER TABLE t1 ADD PRIMARY KEY (i); NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "t1_pkey" FOR TABLE "t1" ALTER TABLE TIME: 25071.766 ms # CREATE TABLE t2 (i INT4); CREATE TABLE TIME: 3.367 ms # INSERT INTO t2 (i) SELECT generate_series(1,10000000); INSERT 0 10000000 TIME: 32917.159 ms # ALTER TABLE t2 ADD PRIMARY KEY (i); NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "t2_pkey" FOR TABLE "t2" ALTER TABLE TIME: 26295.656 ms # CREATE TABLE t3 (i INT4); CREATE TABLE TIME: 20.818 ms # INSERT INTO t3 (i) SELECT generate_series(1,10000000); INSERT 0 10000000 TIME: 33337.663 ms # ALTER TABLE t3 ADD PRIMARY KEY (i); NOTICE: ALTER TABLE / ADD PRIMARY KEY will CREATE implicit INDEX "t3_pkey" FOR TABLE "t3" ALTER TABLE TIME: 25605.919 ms
So, now I have 3 tables, each with 10 million records. Each table is 320MB, and the indexes (unique index for primary key) are 170MB each.
Well, first, I need to make dump of the database:
=> time pg_dump -Fc -f test.dump real 0m40.490s user 0m36.554s sys 0m0.628s
Dump has 63MB, so it's pretty small.
Now, let's try to load it using single task. To make the test a bit better, I will load it 3 times to get some average on timing.
=> for a in 1 2 3; do dropdb depesz; createdb; time pg_restore -d depesz test.dump; done real 2m53.637s user 0m15.741s sys 0m0.284s real 3m3.501s user 0m16.337s sys 0m0.264s real 2m59.544s user 0m16.629s sys 0m0.296s
So, on average it did load in 2m 58.894ms.
Now, let's try with various number of working threads. Since there are 3
tables, each with 1 index, there is no point in going over 3 worker threads.
Generally – pg_restore can put every data load and index creation in its own worker thread, which can give a lot of benefits if your hardware is good enough.
Let's test how good it will be on my laptop.
With 2 worker threads I got these times:
real 1m15.807s real 1m14.390s real 1m11.966s
WHOA! 1m 14.054s in average. It's more than 2 times faster!
And what about 3 worker threads?
real 1m9.079s real 1m13.560s real 1m7.277s
1m 9.972s in average. Not bad.
Generally – adding second worker gave a lot. Adding third – less. Most likely because I have 2 cores in the machine. Unfortunately I don't have any machine with more than 2 cores for tests, but I can imagine that with more cores I could benefit from adding more workers (though one have to remember about limitations of i/o channel).
All in all – another great feature, that is greatly appreciated by me (and probably not only me).