Got interested recently in speed of pg_dump. Specifically, if, over the years, it has became faster, and if yes, how much.
Couple of years ago I was in position where we needed to run pg_dump, and found some inefficiencies, which got later patched. This was around the version 12. So, how does the situation look now? Interestingly, not so great…
So, to test it, I installed all versions from 10 to 18. Versions 10 to 17 were taken from pgdg apt repo, and version 18 was compiled from git HEAD on 30th of April.
To make sure that the dump speed depends on pg_dump itself, I made test db with no data, but non-trivial number of objects: over 2.4 million relations in pg_class, out of which 288k are tables, and 1.7 million are indexes.
This database is schema copy of one of production databases that I've been working with/on in the last couple of years.
To make sure that I tested all potential cases, I did 5 types of dump:
- pg_dump -Fp -s
- pg_dump -Fc -s
- pg_dump -Fc -s -ZX – where X is 6 for PostgreSQL 15 and older, and gzip:6 in case of newer versions
- pg_dump -Fd -j8 -s
- pg_dump -Fd -j8 -s -ZX – where X had the same value as stated above
Each of these dumps were ran three times in a row, and I picked fastest result.
First information is that pg_dump options didn't change anything real. Which makes sense – parallelization (with -Fd -j…) would help with data dump, but I was just dumping schema, which is single-process only.
Across all nine versions the largest difference between pg_dump options was 1.7%. So, to avoid overloading with data, let's focus on single value per version, for pg_dump -Fp -s:
Version | Time | Improvement | % of fastest | # of queries |
---|---|---|---|---|
10 | 63.743s | — | 302.1% | 3270 |
11 | 55.259s | -13.3% | 261.9% | 3270 |
12 | 54.861s | -0.7% | 260.0% | 3271 |
13 | 54.980s | +0.2% | 260.6% | 3271 |
14 | 45.287s | -17.6% | 214.6% | 3271 |
15 | 21.410s | -52.7% | 101.5% | 829 |
16 | 21.099s | -1.5% | 100.0% | 371 |
17 | 29.049s | +37.7% | 137.7% | 372 |
18 | 31.605s | +8.8% | 149.8% | 79 |
Well, this isn't what I expected. For starters, it seems to be significantly faster than I expected. Which is good, obviously. But the bad is that it seems that v17 and v18 introduced some slowdowns.
Please note that I ran 5 separate pg_dump options combinations, and each was ran 3 times. So I ran each version of pg_dump 15 times. That should remove any chance of “random load fluctuation".
It seems that the speedups in versions 15 and 16 were directly related to significant decrease in number of queries sent to server. But why are versions 17 and 18 slower?
Of course we have to understand that my test was worst-case-scenario. Very large number of tables, no data. Still, it is interesting to see the difference…