Waiting for PostgreSQL 11 – Support parallel btree index builds.

Support parallel btree index builds.
 
 
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.
 
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.
 
Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
 
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com

Continue reading Waiting for PostgreSQL 11 – Support parallel btree index builds.

Waiting for 9.6 – Bloom index contrib module

On 1st of April, Teodor Sigaev committed patch:

Bloom index contrib module
 
Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.
 
Module is a single way to test generic WAL interface committed earlier.
 
Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby

Continue reading Waiting for 9.6 – Bloom index contrib module

Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

On 2nd of September, Teodor Sigaev committed patch:

Allow usage of huge maintenance_work_mem for GIN build.
 
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
 
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
 
Robert Abraham <robert.abraham86@googlemail.com> with additions by me

Continue reading Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

Waiting for 9.5 – Add support for index-only scans in GiST.

On 26th of March, Heikki Linnakangas committed patch:

Add support for index-only scans in GiST.
 
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.
 
This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).
 
Anastasia Lubennikova, with additional fixes and modifications by me.

Continue reading Waiting for 9.5 – Add support for index-only scans in GiST.

Waiting for 9.5 – array_offset() and array_offsets()

On 18th of March, Alvaro Herrera committed patch:

array_offset() and array_offsets()
 
These functions return the offset position or positions of a value in an
array.
 
Author: Pavel Stěhule
Reviewed by: Jim Nasby

Continue reading Waiting for 9.5 – array_offset() and array_offsets()

Waiting for 9.5 – Use abbreviated keys for faster sorting of text datums.

On 19th of January, Robert Haas committed patch:

Use abbreviated keys for faster sorting of text datums.
 
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob.  If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.
 
There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys.  HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.
 
Peter Geoghegan, reviewed by me.

Continue reading Waiting for 9.5 – Use abbreviated keys for faster sorting of text datums.

Waiting for 9.5 – BRIN: Block Range Indexes.

On 7th of November, Alvaro Herrera committed patch:

BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.
 
As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.
 
For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.
 
Catalog version bumped due to new builtin catalog entries.
 
There's more that could be done here, but this is a good step forwards.
 
Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.
 
Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.
 
PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.

Continue reading Waiting for 9.5 – BRIN: Block Range Indexes.

How to deal with timestamps?

Every now and then someone asks, on irc or mailing lists, some question which shows deep misunerstanding (or lack of understanding) of timestamps – especially the ones with time zones.

Since I got bitten by this before, let me describe what timestamps are, how to work with them, and what are the most common pitfalls that you can encounter.

Continue reading How to deal with timestamps?