Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.

This is really, really sweet.

Let's see how it works.

Will need, of course, some table with data:

=$ create table test as select generate_series(1,100000000) as id;

Now, I can make an index on the table, on id column. To have some sensible comparison, I'll run it three times.

First, without parallelization:

=$ set maintenance_work_mem = '1GB';
=$ \timing
=$ create index q on test (id);
Time: 25294.185 ms (00:25.294)
Time: 167.724 ms
=$ create index q on test (id);
Time: 25538.743 ms (00:25.539)
Time: 165.651 ms
=$ create index q on test (id);
Time: 25129.483 ms (00:25.129)
Time: 169.231 ms

and now, let's try with 8-way parallel work:

=$ set maintenance_work_mem = '1GB';
=$ set max_parallel_workers = 16;
=$ set max_parallel_maintenance_workers = 8;
=$ \timing
=$ create index q on test (id);
Time: 16922.233 ms (00:16.922)
Time: 163.034 ms
=$ create index q on test (id);
Time: 16958.423 ms (00:16.958)
Time: 165.781 ms
=$ create index q on test (id);
Time: 16484.234 ms (00:16.484)
Time: 166.140 ms

So, it looks like non-parallel create index took around 25 seconds, and parallel one – ~ 16.5. Performance improvement is “8 times faster", but it's clearly there.

Out of curiosity, I tested the speed with max_parallel_maintenance_workers set from 0 to 8. Results (only fastest time reported):

max_parallel_maintenance_workers index creation time (milliseconds)
0 25001.376
1 20440.343
2 17927.330
3 17493.936
4 18911.540
5 19349.886
6 18859.417
7 18794.899
8 19017.379

In any way – it's a great addition, one that will make my life, as DBA, a bit easier. Thanks to all involved.


Based on comment from Peter Geoghegan I did one more test. This time my id column was textual, and contained random data:

  • 10 million rows
  • each row contained one word
  • each word contained 5-14 characters
  • each character was one of: (“a".."z", “A".."Z", “0".."9″)
  • table was 400MB in size

As previously, I ran create index three times with varying max_parallel_maintenance_workers. Results:

max_parallel_maintenance_workers index creation time (milliseconds)
0 24632.961
1 13342.488
2 10393.877
3 9231.971
4 9760.243
5 9247.169
6 9347.192
7 9588.541
8 9132.571

This time the fastest was the run with 8 workers, using almost 2.7 times less time than without parallelization.

Thanks Peter for the hint.

  1. 4 comments

  2. # Peter Geoghegan
    Feb 12, 2018

    You’d probably have been able to show a much larger speed-up by using input into the CREATE INDEX that isn’t already sorted, and/or has more expensive comparisons. A multi-column CREATE INDEX on text columns, for example.

    I sometimes use this benchmark:

  3. # akretschmer
    Feb 12, 2018

    Great work!

    Does that also works for CIC (Create Index Concurrently)? Other question/hint: if i’m running a restore (custom-format) with -j , what would happen in the stage of creating indexes? Burning cores?

    But anyway, great feature!

  4. # Peter Geoghegan
    Feb 12, 2018


    Yes, it works for CREATE INDEX CONCURRENTLY. Only the first table scan/initial index build in parallelized. In other words, only the “CREATE INDEX” part of CIC is performed in parallel, though that’s still very likely to be where the majority of time is spent.

    If you’re running a restore, then nothing special happens or doesn’t happen. You should have set max_parallel_workers in a way that prevents thrashing when many concurrent parallel operations are underway.

  5. Feb 13, 2018

    @Peter Geoghegan:

    Updated the post with test based on your suggestion. Thanks.

Leave a comment