Waiting for 9.3 – Add parallel pg_dump option.

On 24th of March, Andrew Dunstan committed patch:

Add parallel pg_dump option.
New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.
Joachim Wieland, lightly editorialized by Andrew Dunstan.

Continue reading Waiting for 9.3 – Add parallel pg_dump option.

What is the point of bouncing?

Some of you might be familiar with pgBouncer project. Some are not. Some understand what/how/why it does, others do not.

This blog post is to have a place where I can point people who have question about how it works, why, and when it makes sense to use it (pgBouncer that is).

Continue reading What is the point of bouncing?

“= 123” vs. “= ‘depesz'”. What is faster?

There is this idea that normal form in databases require you to use integer, auto incrementing, primary keys.

The idea was discussed by many people, I will just point you to series of three blog posts on the subject by Josh Berkus ( part 1, 2 and 3, and reprise).

One of the points that proponents of surrogate keys (i.e. those based on integer and sequences) raise is that comparing integers is faster than comparing texts. So,

SELECT * FROM users WHERE id = 123

is faster than

SELECT * FROM users WHERE username = 'depesz'

Is it?

Continue reading “= 123" vs. “= ‘depesz'". What is faster?

Waiting for 9.2 – pg_stat_statements improvements

Three interesting patches:

  • On 27th of March, Robert Haas committed patch:
    New GUC, track_iotiming, to track I/O timings.
    Currently, the only way to see the numbers this gathers is via
    EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through
    the stats collector and pg_stat_statements in subsequent patches.
    Ants Aasma, reviewed by Greg Smith, with some further changes by me.
  • On 27th of March, Robert Haas committed patch:
    Expose track_iotiming information via pg_stat_statements.
    Ants Aasma, reviewed by Greg Smith, with very minor tweaks by me.
  • On 29th of March, Tom Lane committed patch:

    Improve contrib/pg_stat_statements to lump "similar" queries together.
    pg_stat_statements now hashes selected fields of the analyzed parse tree
    to assign a "fingerprint" to each query, and groups all queries with the
    same fingerprint into a single entry in the pg_stat_statements view.
    In practice it is expected that queries with the same fingerprint will be
    equivalent except for values of literal constants.  To make the display
    more useful, such constants are replaced by "?" in the displayed query
    This mechanism currently supports only optimizable queries (SELECT,
    INSERT, UPDATE, DELETE).  Utility commands are still matched on the
    basis of their literal query strings.
    There remain some open questions about how to deal with utility statements
    that contain optimizable queries (such as EXPLAIN and SELECT INTO) and how
    to deal with expiring speculative hashtable entries that are made to save
    the normalized form of a query string.  However, fixing these issues should
    require only localized changes, and since there are other open patches
    involving contrib/pg_stat_statements, it seems best to go ahead and commit
    what we've got.
    Peter Geoghegan, reviewed by Daniel Farina

Continue reading Waiting for 9.2 – pg_stat_statements improvements

What index to create?

Some time ago I wrote a blogpost about why index might not be used.

While this post seemed to be well received (top link from depesz.com on reddit), it doesn't answer another question – what index to create for given situation.

I'll try to cover this question now.

IMPORTANT UPDATE: As of PostgreSQL 10 hash indexes are WAL logged. As such, main point against them is gone.

Continue reading What index to create?

Waiting for 9.1 – Faster LIKE/ILIKE

On 1st of February, Tom Lane committed patch:

Support LIKE and ILIKE index searches via contrib/pg_trgm indexes.
Unlike Btree-based LIKE optimization, this works for non-left-anchored
search patterns.  The effectiveness of the search depends on how many
trigrams can be extracted from the pattern.  (The worst case, with no      
trigrams, degrades to a full-table scan, so this isn't a panacea.  But   
it can be very useful.)                                                 
Alexander Korotkov, reviewed by Jan Urbanski

Continue reading Waiting for 9.1 – Faster LIKE/ILIKE

Waiting for 9.1 – Unlogged tables

On 29th of December, Robert Haas committed interesting patch, which does:

Support unlogged tables.
The contents of an unlogged table aren't WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery.  Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.

(edited commit message, due to this mail.

Continue reading Waiting for 9.1 – Unlogged tables

Waiting for 9.1 – KNNGIST

On 4th of December, Tom Lane committed really cool patch:

KNNGIST, otherwise known as order-by-operator support for GIST.

This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.
There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).
Teodor Sigaev and Tom Lane

Continue reading Waiting for 9.1 – KNNGIST

Understanding postgresql.conf : checkpoint_completion_target

Starting new blog series – explanation of various configuration parameters.

I will of course follow no schedule or order – if I'd had to – it would be my job, and in this way – it's fun.

First configuration parameter to write about is checkpoint_completion_target.

Continue reading Understanding postgresql.conf : checkpoint_completion_target