On 24th of March, Andrew Dunstan committed patch:
Add parallel pg_dump option. New infrastructure is added which creates a set number of workers (threads on Windows, forked processes on Unix). Jobs are then handed out to these workers by the master process as needed. pg_restore is adjusted to use this new infrastructure in place of the old setup which created a new worker for each step on the fly. Parallel dumps acquire a snapshot clone in order to stay consistent, if available. The parallel option is selected by the -j / --jobs command line parameter of pg_dump. Joachim Wieland, lightly editorialized by Andrew Dunstan.
Very recently I wrote about dumping in parallel, and now we have it committed 🙂
So, of course, I had to test it.
To have some sensible dataset, I took couple of smallish databases, and loaded it together into single DB. Result database has 1043 MB, and 54 tables.
I tried dumping it normally, without parallelism, using pg_dump –format=p and pg_dump –format=c. For each format, I ran pg_dump 5 times. Results:
- format=plain, average time: 29.716s ( from 23.303s to 35.324s, file size 2,053,750,560 )
- format=custom, average time: 64.874s ( from 63.129s to 66.354s, file size 475,318,581 )
Huge time difference made me think, and realize it's probably due to compression. So I redid –format=c, with -Z0 option. Results:
- average time: 22.192s
- time from: 21.564s to: 23.086s
- file size: 2,061,791,159
So, that's the baseline.
Now, I did test new, parallel dumping, with 2, 4 and 8 jobs.
- -j2 : from: 18.592s, to: 23.564s, average: 20.918s
- -j4 : from: 18.631s, to: 31.076s, average: 24.983s
- -j8 : from: 18.528s, to: 24.062s, average: 21.648s
The difference, while exists, is not really great. It could be related to number of factors, starting from bad IO (tested on RAID1 of WD velociraptors, and on 1st gen Intel SSD) to just not enough data. Or perhaps tuning issues. Whatever it is – queries run by pg_dump are not locked, so it looks like lack of significant speedup is not a flaw in technology, but rather in my testing setup.
In any case – I am very enthusiastic about this patch, and I hope that when it will get in released Pg version, and I'll get some more serious databases to upgrade – I will see better numbers.