On 11th of November, Robert Haas committed patch:

Generate parallel sequential scan plans in simple cases.
Add a new flag, consider_parallel, to each RelOptInfo, indicating
whether a plan for that relation could conceivably be run inside of
a parallel worker.  Right now, we're pretty conservative: for example,
it might be possible to defer applying a parallel-restricted qual
in a worker, and later do it in the leader, but right now we just
don't try to parallelize access to that relation.  That's probably
the right decision in most cases, anyway.
Using the new flag, generate parallel sequential scan plans for plain
baserels, meaning that we now have parallel sequential scan in
PostgreSQL.  The logic here is pretty unsophisticated right now: the
costing model probably isn't right in detail, and we can't push joins
beneath Gather nodes, so the number of plans that can actually benefit
from this is pretty limited right now.  Lots more work is needed.
Nevertheless, it seems time to enable this functionality so that all
this code can actually be tested easily by users and developers.
Note that, if you wish to test this functionality, it will be
necessary to set max_parallel_degree to a value greater than the
default of 0.  Once a few more loose ends have been tidied up here, we
might want to consider changing the default value of this GUC, but
I'm leaving it alone for now.
Along the way, fix a bug in cost_gather: the previous coding thought
that a Gather node's transfer overhead should be costed on the basis of
the relation size rather than the number of tuples that actually need
to be passed off to the leader.
Patch by me, reviewed in earlier versions by Amit Kapila.

Read more »

On 30th of October, Tom Lane committed patch:

Implement lookbehind constraints in our regular-expression engine.
A lookbehind constraint is like a lookahead constraint in that it consumes
no text; but it checks for existence (or nonexistence) of a match *ending*
at the current point in the string, rather than one *starting* at the
current point.  This is a long-requested feature since it exists in many
other regex libraries, but Henry Spencer had never got around to
implementing it in the code we use.
Just making it work is actually pretty trivial; but naive copying of the
logic for lookahead constraints leads to code that often spends O(N^2) time
to scan an N-character string, because we have to run the match engine
from string start to the current probe point each time the constraint is
checked.  In typical use-cases a lookbehind constraint will be written at
the start of the regex and hence will need to be checked at every character
--- so O(N^2) work overall.  To fix that, I introduced a third copy of the
core DFA matching loop, paralleling the existing longest() and shortest()
loops.  This version, matchuntil(), can suspend and resume matching given
a couple of pointers' worth of storage space.  So we need only run it
across the string once, stopping at each interesting probe point and then
resuming to advance to the next one.
I also put in an optimization that simplifies one-character lookahead and
lookbehind constraints, such as "(?=x)" or "(?< !\w)", into AHEAD and BEHIND
constraints, which already existed in the engine.  This avoids the overhead
of the LACON machinery entirely for these rather common cases.
The net result is that lookbehind constraints run a factor of three or so
slower than Perl's for multi-character constraints, but faster than Perl's
for one-character constraints ... and they work fine for variable-length
constraints, which Perl gives up on entirely.  So that's not bad from a
competitive perspective, and there's room for further optimization if
anyone cares.  (In reality, raw scan rate across a large input string is
probably not that big a deal for Postgres usage anyway; so I'm happy if
it's linear.)

Read more »

Some time ago, guys from Instagram shared their approach to generating unique ids on multiple hosts in a way that guarantees (to reasonable extend) uniqueness, and doesn't require any centralized service.

Earlier this month, the build benchmarked their solution vs. UUIDs, and vs. bigserial.

I thought – whether C based code for nextid would be faster.

Read more »

Some time ago I was contacted by Adam Smith – he pointed out that subquery names in “Subquery Scan" nodes were not properly anonymized.

Now, they are, which you can see in here:

While working on it, I also added (helpful?) links from node types to my blogposts about reading explain output – Explaining the unexplainable.

Read more »

On 3rd of October, Andres Freund committed patch:

Without CASCADE, if an extension has an unfullfilled dependency on
another extension, CREATE EXTENSION ERRORs out with "required extension
... is not installed". That is annoying, especially when that dependency
is an implementation detail of the extension, rather than something the
extension's user can make sense of.
In addition to CASCADE this also includes a small set of regression
tests around CREATE EXTENSION.
Author: Petr Jelinek, editorialized by Michael Paquier, Andres Freund
Reviewed-By: Michael Paquier, Andres Freund, Jeff Janes
Discussion: <a class="text" href="/gitweb/?p=postgresql.git;a=object;h=557E0520">557E0520</a>.3040800@2ndquadrant.com

Read more »

October 8th, 2015 by depesz | Tags: , , , , , | 2 comments »

In case you're not familiar – there is a thing called LVM – it's a layer between physical disks, and filesystems, and allow certain interesting things, like extending, migrating, snapshotting and others.

At one of systems I've been dealing with, we stumbled upon specific requirement – change LV into striped. It took me a while to figure it out, so I'm writing it down, so I'll never have to research it again.

Read more »

On 8th of September, Alvaro Herrera committed patch:

Allow per-tablespace effective_io_concurrency
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others.  Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.
Author: Julien Rouhaud
Reviewed by: Andres Freund

Read more »

On 7th of September, Jeff Davis committed patch:

Add log_line_prefix option 'n' for Unix epoch.
Prints time as Unix epoch with milliseconds.
Tomas Vondra, reviewed by Fabien Coelho.

Read more »

On 2nd of September, Teodor Sigaev committed patch:

Allow usage of huge maintenance_work_mem for GIN build.
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
Robert Abraham <robert .abraham86@googlemail.com> with additions by me</robert>

Read more »

September 7th, 2015 by depesz | Tags: , , , , , | 5 comments »

There exists an extension to PostgreSQL, which lets you use hypothetical indexes.

What are there? That's simple – these are indexes that don't really exist. So what good are they?

Let's see.

Read more »