Waiting for 9.6 – Add new system view, pg_config

On 17th of February, Joe Conway committed patch:

Add new system view, pg_config
 
Move and refactor the underlying code for the pg_config client
application to src/common in support of sharing it with a new
system information SRF called pg_config() which makes the same
information available via SQL. Additionally wrap the SRF with a
new system view, as called pg_config.
 
Patch by me with extensive input and review by Michael Paquier
and additional review by Alvaro Herrera.

Continue reading Waiting for 9.6 – Add new system view, pg_config

Waiting for 9.6 – Remove GROUP BY columns that are functionally dependent on other columns.

On 11th of February, Tom Lane committed patch:

Remove GROUP BY columns that are functionally dependent on other columns.
 
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.
 
This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses.  Thus, redundant GROUP BY items are not uncommon.  Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.
 
David Rowley, reviewed by Julien Rouhaud

Continue reading Waiting for 9.6 – Remove GROUP BY columns that are functionally dependent on other columns.

Waiting for 9.6 – Add num_nulls() and num_nonnulls() to count NULL arguments.

On 5th of February, Tom Lane committed patch:

Add num_nulls() and num_nonnulls() to count NULL arguments.
 
An example use-case is "CHECK(num_nonnulls(a,b,c) = 1)" to assert that
exactly one of a,b,c isn't NULL.  The functions are variadic, so they
can also be pressed into service to count the number of null or nonnull
elements in an array.
 
Marko Tiikkaja, reviewed by Pavel Stehule

Continue reading Waiting for 9.6 – Add num_nulls() and num_nonnulls() to count NULL arguments.

Waiting for 9.6 – Support parallel joins, and make related improvements.

On 20th of January, Robert Haas committed patch:

The core innovation of this patch is the introduction of the concept
of a partial path; that is, a path which if executed in parallel will
generate a subset of the output rows in each process.  Gathering a
partial path produces an ordinary (complete) path.  This allows us to
generate paths for parallel joins by joining a partial path for one
side (which at the baserel level is currently always a Partial Seq
Scan) to an ordinary path on the other side.  This is subject to
various restrictions at present, especially that this strategy seems
unlikely to be sensible for merge joins, so only nested loops and
hash joins paths are generated.
 
This also allows an Append node to be pushed below a Gather node in
the case of a partitioned table.
 
Testing revealed that early versions of this patch made poor decisions
in some cases, which turned out to be caused by the fact that the
original cost model for Parallel Seq Scan wasn't very good.  So this
patch tries to make some modest improvements in that area.
 
There is much more to be done in the area of generating good parallel
plans in all cases, but this seems like a useful step forward.
 
Patch by me, reviewed by Dilip Kumar and Amit Kapila.

Continue reading Waiting for 9.6 – Support parallel joins, and make related improvements.

Waiting for 9.6 – Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.

On 9th of December, Robert Haas committed patch:

Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.
 
The original parallel sequential scan commit included only very limited
changes to the EXPLAIN output.  Aggregated totals from all workers were
displayed, but there was no way to see what each individual worker did
or to distinguish the effort made by the workers from the effort made by
the leader.
 
Per a gripe by Thom Brown (and maybe others).  Patch by me, reviewed
by Amit Kapila.

Continue reading Waiting for 9.6 – Allow EXPLAIN (ANALYZE, VERBOSE) to display per-worker statistics.

Waiting for 9.6 – psql: Support multiple -c and -f options, and allow mixing them.

On 8th of December, Robert Haas committed patch:

psql: Support multiple -c and -f options, and allow mixing them.
 
To support this, we must reconcile some historical anomalies in the
behavior of -c.  In particular, as a backward-incompatibility, -c no
longer implies --no-psqlrc.
 
Pavel Stehule (code) and Catalin Iacob (documentation).  Review by
Michael Paquier and myself.  Proposed behavior per Tom Lane.

Continue reading Waiting for 9.6 – psql: Support multiple -c and -f options, and allow mixing them.

Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

On 11th of November, Robert Haas committed patch:

Generate parallel sequential scan plans in simple cases.
 
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.
 
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
 
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
 
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
 
Patch by me, reviewed in earlier versions by Amit Kapila.

Continue reading Waiting for 9.6 – Generate parallel sequential scan plans in simple cases.

Waiting for 9.6 – Implement lookbehind constraints in our regular-expression engine.

On 30th of October, Tom Lane committed patch:

Implement lookbehind constraints in our regular-expression engine.
 
A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point.  This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.
 
Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked.  In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall.  To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops.  This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space.  So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.
 
I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHIND
constraints, which already existed in the engine.  This avoids the overhead
of the LACON machinery entirely for these rather common cases.
 
The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely.  So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares.  (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)

Continue reading Waiting for 9.6 – Implement lookbehind constraints in our regular-expression engine.