Waiting for PostgreSQL 11 – Support parallel btree index builds.

Support parallel btree index builds.
 
 
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.
 
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.
 
Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
 
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com

Continue reading Waiting for PostgreSQL 11 – Support parallel btree index builds.

Fix for parallel scans on explain.depesz.com

On Friday, Zr40 reported on irc that explain.depesz.com doesn't show table/index names for Parallel Seq Scan nodes.

Checked it and found couple of other omissions of the same kind with other Parallel* scans.

Fixed (I hope) all of them in:

The change is not really big, but just figured I'll let you know.

Waiting for 9.6 – Support parallel aggregation.

On 21st of March, Robert Haas committed patch:

Support parallel aggregation.
 
Parallel workers can now partially aggregate the data and pass the
transition values back to the leader, which can combine the partial
results to produce the final answer.
 
David Rowley, based on earlier work by Haribabu Kommi.  Reviewed by
Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.

Continue reading Waiting for 9.6 – Support parallel aggregation.

Waiting for 9.6 – Support parallel joins, and make related improvements.

On 20th of January, Robert Haas committed patch:

The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.
 
This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.
 
Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.
 
There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.
 
Patch by me, reviewed by Dilip Kumar and Amit Kapila.

Continue reading Waiting for 9.6 – Support parallel joins, and make related improvements.

Waiting for 9.6 – Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.

On 9th of December, Robert Haas committed patch:

Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.
 
The original parallel sequential scan commit included only very limited
changes to the EXPLAIN output.  Aggregated totals from all workers were
displayed, but there was no way to see what each individual worker did
or to distinguish the effort made by the workers from the effort made by
the leader.
 
Per a gripe by Thom Brown (and maybe others).  Patch by me, reviewed
by Amit Kapila.

Continue reading Waiting for 9.6 – Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.

Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

On 11th of November, Robert Haas committed patch:

Generate parallel sequential scan plans in simple cases.
 
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.
 
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
 
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
 
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
 
Patch by me, reviewed in earlier versions by Amit Kapila.

Continue reading Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

Waiting for 9.5 – vacuumdb: enable parallel mode

On 23rd of January, Alvaro Herrera committed patch:

vacuumdb: enable parallel mode
 
This mode allows vacuumdb to open several server connections to vacuum
or analyze several tables simultaneously.
 
Author: Dilip Kumar.  Some reworking by Álvaro Herrera
Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund

Continue reading Waiting for 9.5 – vacuumdb: enable parallel mode

Waiting for 9.3 – Add parallel pg_dump option.

On 24th of March, Andrew Dunstan committed patch:

Add parallel pg_dump option.
 
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.
 
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
 
Joachim Wieland, lightly editorialized by Andrew Dunstan.

Continue reading Waiting for 9.3 – Add parallel pg_dump option.

Parallel dumping of databases

Some time ago I wrote a piece on speeding up dump/restore process using custom solution that was parallelizing process.

Later on I wrote some tools (“fast dump and restore") to do it in more general way.

But all of them had a problem – to get consistent dump you need to stop all concurrent access to your database. Why, and how to get rid of this limitation?

Continue reading Parallel dumping of databases

Waiting for 8.4 – parallel restoration of dumps

On 2nd of February Andrew Dunstan committed his patch (with editing by Tom Lane) that:

Log Message:
-----------
Provide for parallel restoration from a custom format archive. Each data and
post-data step is run in a separate worker child (a thread on Windows, a child
process elsewhere) up to the concurrent number specified by the new pg_restore
command-line --multi-thread | -m switch.

Continue reading Waiting for 8.4 – parallel restoration of dumps