index – select * from depesz;

How to index data with tsearch?

Someone asked recently on Slack, whether one should build tsvector data in a field, and index it, or make index on expression.

We talked about it for a while, and I thought I'll reformat my thoughts into blogpost to avoid retyping it next time.

Continue reading How to index data with tsearch?

Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

On 3rd of February 2022, Peter Eisentraut committed patch:

Add UNIQUE null treatment option 
 
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
 
This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
 
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
 
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
 
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com

Continue reading Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

Why is it hard to automatically suggest what index to create?

Every now and then someone asks me to add index suggestions to explain.depesz.com.

I always respond with polite decline. This is complicated thing to do, and I just don't have that time.

Lately I was asked, on Slack to add (to explain.depesz.com) link to pganalyze Index Advisor for Postgres.

So I checked it out. And results prompted me to write this blogpost.

Continue reading Why is it hard to automatically suggest what index to create?

Waiting for PostgreSQL 13 – Allow vacuum command to process indexes in parallel.

On 20th of January 2020, Amit Kapila committed patch:

Allow vacuum command to process indexes in parallel.
 
This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.
 
Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.
 
The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.
 
Author: Masahiko Sawada and Amit Kapila
 
Mahendra Singh and Sergei Kornilov
 
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com

Continue reading Waiting for PostgreSQL 13 – Allow vacuum command to process indexes in parallel.

Waiting for PostgreSQL 12 – Allow VACUUM to be run with index cleanup disabled.

On 4th of April 2019, Robert Haas committed patch:

Allow VACUUM to be run with index cleanup disabled.
 
 
This commit adds a new reloption, vacuum_index_cleanup, which
controls whether index cleanup is performed for a particular
relation by default.  It also adds a new option to the VACUUM
command, INDEX_CLEANUP, which can be used to override the
reloption.  If neither the reloption nor the VACUUM option is
used, the default is true, as before.
 
Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro
Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me.
The wording of the documentation is mostly due to me.
 
Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com

Continue reading Waiting for PostgreSQL 12 – Allow VACUUM to be run with index cleanup disabled.

Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

On 2nd of April 2019, Alvaro Herrera committed patch:

Report progress of CREATE INDEX operations
 
 
This uses the progress reporting infrastructure added by ,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
 
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
 
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)
 
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql

Continue reading Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

Waiting for PostgreSQL 11 – Indexes with INCLUDE columns and their support in B-tree

On 7th of April 2018, Teodor Sigaev committed patch:

Indexes with INCLUDE columns and their support in B-tree
 
This patch introduces INCLUDE clause to index definition.  This clause
specifies a list of columns which will be included as a non-key part in
the index.  The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans.  Also, such columns don't need to have
appropriate operator classes.  Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.
 
Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine.  For now, only B-tree indexes support INCLUDE clause.
 
In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys).  Therefore, B-tree indexes
now might have variable number of attributes.  This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid.  Free 13th bit of t_info is used for indicating
that.  This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn't need special
handling of B-tree indexes for that.
 
Bump catalog version
 
Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
             David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/.4010101@postgrespro.ru

Continue reading Waiting for PostgreSQL 11 – Indexes with INCLUDE columns and their support in B-tree

Waiting for PostgreSQL 11 – Support parallel btree index builds.

Support parallel btree index builds.
 
 
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds.  Testing
to date shows that this can often be 2-3x faster than a serial
index build.
 
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature.  We can
refine it as we get more experience.
 
Peter Geoghegan with some help from Rushabh Lathia.  While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature.  Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
 
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com

Continue reading Waiting for PostgreSQL 11 – Support parallel btree index builds.

Waiting for 9.6 – Bloom index contrib module

On 1st of April, Teodor Sigaev committed patch:

Bloom index contrib module
 
Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.
 
Module is a single way to test generic WAL interface committed earlier.
 
Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby

Continue reading Waiting for 9.6 – Bloom index contrib module

Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

On 2nd of September, Teodor Sigaev committed patch:

Allow usage of huge maintenance_work_mem for GIN build.
 
Currently, in-memory posting list during GIN build process is limited 1GB
because of using repalloc. The patch replaces call of repalloc to repalloc_huge.
It increases limit of posting list from 180 millions
(1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields
in GinEntryAccumulator and subsequent calls. Check added.
 
Also, fix accounting of allocatedMemory during build to prevent integer
overflow with maintenance_work_mem > 4GB.
 
Robert Abraham <robert.abraham86@googlemail.com> with additions by me

Continue reading Waiting for 9.6 – Allow usage of huge maintenance_work_mem for GIN build.

=$
|

Tag: index