Someone asked recently on Slack, whether one should build tsvector data in a field, and index it, or make index on expression.
We talked about it for a while, and I thought I'll reformat my thoughts into blogpost to avoid retyping it next time.
On 3rd of February 2022, Peter Eisentraut committed patch:
Add UNIQUE null treatment option The SQL standard has been ambiguous about whether null values in unique constraints should be considered equal or not. Different implementations have different behaviors. In the SQL:202x draft, this has been formalized by making this implementation-defined and adding an option on unique constraint definitions UNIQUE [ NULLS [NOT] DISTINCT ] to choose a behavior explicitly. This patch adds this option to PostgreSQL. The default behavior remains UNIQUE NULLS DISTINCT. Making this happen in the btree code is pretty easy; most of the patch is just to carry the flag around to all the places that need it. The CREATE UNIQUE INDEX syntax extension is not from the standard, it's my own invention. I named all the internal flags, catalog columns, etc. in the negative ("nulls not distinct") so that the default PostgreSQL behavior is the default if the flag is false. Reviewed-by: Maxim Orlov <firstname.lastname@example.org> Reviewed-by: Pavel Borisov <email@example.com> Discussion: https://firstname.lastname@example.org
Every now and then someone asks me to add index suggestions to explain.depesz.com.
I always respond with polite decline. This is complicated thing to do, and I just don't have that time.
So I checked it out. And results prompted me to write this blogpost.
On 20th of January 2020, Amit Kapila committed patch:
Allow vacuum command to process indexes in parallel. This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. This option can't be used with the FULL option. Each index is processed by at most one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes. The parallel degree is either specified by the user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The index can participate in parallel vacuum iff it's size is greater than min_parallel_index_scan_size. Author: Masahiko Sawada and Amit Kapila Mahendra Singh and Sergei Kornilov Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0email@example.com
On 4th of April 2019, Robert Haas committed patch:
Allow VACUUM to be run with index cleanup disabled. This commit adds a new reloption, vacuum_index_cleanup, which controls whether index cleanup is performed for a particular relation by default. It also adds a new option to the VACUUM command, INDEX_CLEANUP, which can be used to override the reloption. If neither the reloption nor the VACUUM option is used, the default is true, as before. Masahiko Sawada, reviewed and tested by Nathan Bossart, Alvaro Herrera, Kyotaro Horiguchi, Darafei Praliaskouski, and me. The wording of the documentation is mostly due to me. Discussion: http://postgr.es/m/CAD21AoAt5R3DNUZSjOoXDUY=naYPUOuffVsRzuTYMz29yLzQCA@mail.gmail.com
On 2nd of April 2019, Alvaro Herrera committed patch:
Report progress of CREATE INDEX operations This uses the progress reporting infrastructure added by , adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY. There are two pieces to this: one is index-AM-agnostic, and the other is AM-specific. The latter is fairly elaborate for btrees, including reportage for parallel index builds and the separate phases that btree index creation uses; other index AMs, which are much simpler in their building procedures, have simplistic reporting only, but that seems sufficient, at least for non-concurrent builds. The index-AM-agnostic part is fairly complete, providing insight into the CONCURRENTLY wait phases as well as block-based progress during the index validation table scan. (The index validation index scan requires patching each AM, which has not been included here.) Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada Discussion: https://firstname.lastname@example.org
On 7th of April 2018, Teodor Sigaev committed patch:
Indexes with INCLUDE columns and their support in B-tree This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://email@example.com
Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
On 1st of April, Teodor Sigaev committed patch:
Bloom index contrib module Module provides new access method. It is actually a simple Bloom filter implemented as pgsql's index. It could give some benefits on search with large number of columns. Module is a single way to test generic WAL interface committed earlier. Author: Teodor Sigaev, Alexander Korotkov Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby
On 2nd of September, Teodor Sigaev committed patch:
Allow usage of huge maintenance_work_mem for GIN build. Currently, in-memory posting list during GIN build process is limited 1GB because of using repalloc. The patch replaces call of repalloc to repalloc_huge. It increases limit of posting list from 180 millions (1GB / sizeof(ItemPointerData)) to 4 billions limited by maxcount/count fields in GinEntryAccumulator and subsequent calls. Check added. Also, fix accounting of allocatedMemory during build to prevent integer overflow with maintenance_work_mem > 4GB. Robert Abraham <firstname.lastname@example.org> with additions by me