On 7th of November, Alvaro Herrera committed patch:
BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
Information in commit log is pretty simple to understand. BRIN is new index type. One that should have much lower overhead than BTree (though possibly slower).
Things that should be considered:
- time to create index (create index on some number of rows)
- time to update index (update some number of rows in table that is already indexed)
- size of index
- time to query using the index and = operator
- time to query using the index and between operator
To test it I prepared two tables:
$ CREATE TABLE for_btree ( id int8, payload text ); CREATE TABLE $ INSERT INTO for_btree (id, payload) SELECT i, repeat('depesz', 100) FROM generate_series (1, 1000000) i; INSERT 0 1000000 $ CREATE TABLE for_brin ( id int8, payload text ); CREATE TABLE $ INSERT INTO for_brin (id, payload) SELECT i, repeat('depesz', 100) FROM generate_series (1, 1000000) i; INSERT 0 1000000
Both tables are ~ 650MB in size. First, I'll create index on for_btree, 3 times in a row, and use best time.
Best time for btree: 626.859 ms.
Best time for brin: 208.754 ms.
Updates – I'll simply update 30% of the rows by changing id to 1000000 + id, like:
UPDATE for_btree SET id = 1000000 + id WHERE id < 300000;
Best time for btree: 8398.461 ms.
Best time for brin: 1398.711 ms.
Size of btree index: 28MB.
Size of brin index: 64kB. (more on it later)
And now searching. I'll check for specific id (654321) and 5% rows range (100000 … 150000), with queries:
$ SELECT COUNT(*) FROM TABLE WHERE id = 654321::int8; $ SELECT COUNT(*) FROM TABLE WHERE id BETWEEN 600000::int8 AND 650000::int8;
(note: I had to add ::int8, as apparently now int8 columns and simple integers don't mix well…)
- Btree = – 0.356 ms
- Btree between – 9.574 ms
- Brin = – 17.386 ms
- Brin between – 21.090 ms
So it looks that in size, creation and update time BRIN is clear winner, but searches using it are slower than with BTree.
One additional thing – BRIN is configurable. You can make it more detailed, or less. It is done by specifying pages_per_range storage parameter. The smaller the value, the larger will be the index. And possibly faster. But updates on it will take longer.
|pages_per_range||index size||create time||update time||search (=) time||search (between) time|
|10000||24 kB||369.234 ms||8567.034 ms||87.606 ms||154.182 ms|
|1000||24 kB||373.150 ms||8812.391 ms||96.339 ms||133.546 ms|
|100||40 kB||360.555 ms||8867.043 ms||98.941 ms||131.126 ms|
|10||296 kB||392.038 ms||8619.480 ms||99.834 ms||132.327 ms|
|1||2800 kB||629.255 ms||8600.095 ms||114.882 ms||147.765 ms|
To be honest I don't quite understand why it doesn't speed up with larger indexes (i.e. searches), maybe it's due to specific distribution of data.
In any way – the index does what it was supposed to do: be small, and speed up (some) searches. Cool stuff.