February 9th, 2009 by depesz | Tags: , , , , , , | 3 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 2nd of February Andrew Dunstan committed his patch (with editing by Tom Lane) that:

Log Message:
-----------
Provide for parallel restoration from a custom format archive. Each data and
post-data step is run in a separate worker child (a thread on Windows, a child
process elsewhere) up to the concurrent number specified by the new pg_restore
command-line --multi-thread | -m switch.

Well, the basic idea is simple: loading using multiple threads will (in theory) make restoration faster.

Parallel restoration is available only when dealing with -Fc dumps (custom format).

So, let's test how it works.

First, I created tables, and put some rows in them:

# CREATE TABLE t1 (i INT4);
CREATE TABLE
Time: 56.411 ms
 
# INSERT INTO t1 (i) SELECT generate_series(1,10000000);
INSERT 0 10000000
Time: 34922.493 ms
 
# ALTER TABLE t1 add PRIMARY KEY (i);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t1_pkey" for table "t1"
ALTER TABLE
Time: 25071.766 ms
 
# CREATE TABLE t2 (i INT4);
CREATE TABLE
Time: 3.367 ms
 
# INSERT INTO t2 (i) SELECT generate_series(1,10000000);
INSERT 0 10000000
Time: 32917.159 ms
 
# ALTER TABLE t2 add PRIMARY KEY (i);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t2_pkey" for table "t2"
ALTER TABLE
Time: 26295.656 ms
 
# CREATE TABLE t3 (i INT4);
CREATE TABLE
Time: 20.818 ms
 
# INSERT INTO t3 (i) SELECT generate_series(1,10000000);
INSERT 0 10000000
Time: 33337.663 ms
 
# ALTER TABLE t3 add PRIMARY KEY (i);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t3_pkey" for table "t3"
ALTER TABLE
Time: 25605.919 ms

So, now I have 3 tables, each with 10 million records. Each table is 320MB, and the indexes (unique index for primary key) are 170MB each.

Well, first, I need to make dump of the database:

=> time pg_dump -Fc -f test.dump
 
real 0m40.490s
user 0m36.554s
sys 0m0.628s

Dump has 63MB, so it's pretty small.

Now, let's try to load it using single task. To make the test a bit better, I will load it 3 times to get some average on timing.

=> for a in 1 2 3; do dropdb depesz; createdb; time pg_restore -d depesz test.dump; done
 
real 2m53.637s
user 0m15.741s
sys 0m0.284s
 
real 3m3.501s
user 0m16.337s
sys 0m0.264s
 
real 2m59.544s
user 0m16.629s
sys 0m0.296s

So, on average it did load in 2m 58.894ms.

Now, let's try with various number of working threads. Since there are 3
tables, each with 1 index, there is no point in going over 3 worker threads.

Generally – pg_restore can put every data load and index creation in its own worker thread, which can give a lot of benefits if your hardware is good enough.

Let's test how good it will be on my laptop.

With 2 worker threads I got these times:

real 1m15.807s
real 1m14.390s
real 1m11.966s

WHOA! 1m 14.054s in average. It's more than 2 times faster!

And what about 3 worker threads?

real 1m9.079s
real 1m13.560s
real 1m7.277s

1m 9.972s in average. Not bad.

Generally – adding second worker gave a lot. Adding third – less. Most likely because I have 2 cores in the machine. Unfortunately I don't have any machine with more than 2 cores for tests, but I can imagine that with more cores I could benefit from adding more workers (though one have to remember about limitations of i/o channel).

All in all – another great feature, that is greatly appreciated by me (and probably not only me).

  1. 3 comments

  2. # gj
    Feb 9, 2009

    Thanks Depesz.
    Yes, I tested it on 4 way xeon, and up to 6 threads I can see a difference, on fairly big db (35GB compressed dump, loads of tables, indices, etc).

  3. # Jarek S.
    Oct 18, 2009

    Hi,
    – do dropdb depesz; createdb;
    + do dropdb depesz; createdb depesz;
    I think this is little misspeling.

  4. Oct 19, 2009

    @Jarek S.:
    not really – createdb, if you don’t give it database name uses name of user as database name. and if username is not given – it uses shell user name.

    so, if i’m logged as depesz to my linuxbox, createdb with no parameters will create database depesz. or at least – will try to.

Leave a comment