sharding – select * from depesz;

Waiting for 9.6 – Support parallel joins, and make related improvements.

On 20th of January, Robert Haas committed patch:

The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.
 
This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.
 
Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.
 
There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.
 
Patch by me, reviewed by Dilip Kumar and Amit Kapila.

Continue reading Waiting for 9.6 – Support parallel joins, and make related improvements.

Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

On 11th of November, Robert Haas committed patch:

Generate parallel sequential scan plans in simple cases.
 
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.
 
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
 
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
 
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
 
Patch by me, reviewed in earlier versions by Amit Kapila.

Continue reading Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

Partitioning – what? why? how?

Recently I noticed that more and more cases that I deal with could use some partitioning. And while theoretically most people know about it, it's definitely not a very well-understood feature, and sometimes people are scared of it.

So, I'll try to explain, to my best knowledge, what it is, why one would want to use it, and how to actually make it happen.

Continue reading Partitioning – what? why? how?

Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

On 22nd of March, Tom Lane committed patch:

Allow foreign tables to participate in inheritance.
 
Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.
 
As side effects of this, allow foreign tables to have NOT VALID CHECK
constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to
accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS.  Continuing to
disallow these things would've required bizarre and inconsistent special
cases in inheritance behavior.  Since foreign tables don't enforce CHECK
constraints anyway, a NOT VALID one is a complete no-op, but that doesn't
mean we shouldn't allow it.  And it's possible that some FDWs might have
use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops
for most.
 
An additional change in support of this is that when a ModifyTable node
has multiple target tables, they will all now be explicitly identified
in EXPLAIN output, for example:
 
 Update on pt1  (cost=0.00..321.05 rows=3541 width=46)
   Update on pt1
   Foreign Update on ft1
   Foreign Update on ft2
   Update on child3
   ->  Seq Scan on pt1  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on ft1  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on ft2  (cost=100.00..148.03 rows=1170 width=46)
   ->  Seq Scan on child3  (cost=0.00..25.00 rows=1200 width=46)
 
This was done mainly to provide an unambiguous place to attach "Remote SQL"
fields, but it is useful for inherited updates even when no foreign tables
are involved.
 
Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me

Continue reading Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

The secret ingredient in the webscale sauce

( title, in case you don't know, comes from excellent “movie" about databases )

The secret ingredient is of course sharding.

Can we do sharding in PostgreSQL? First, let's define what sharding is:

Continue reading The secret ingredient in the webscale sauce

=$
|

Tag: sharding

Waiting for 9.6 – Support parallel joins, and make related improvements.

Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

Partitioning – what? why? how?

Waiting for 9.5 – Allow foreign tables to participate in inheritance. – A.K.A. PostgreSQL got sharding.

The secret ingredient in the webscale sauce