September 28th, 2011 by depesz | Tags: , , , , , | 9 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

Some time ago I wrote a blogpost about why index might not be used.

While this post seemed to be well received (top link from depesz.com on reddit), it doesn't answer another question – what index to create for given situation.

I'll try to cover this question now.

First of all, before we can even start thinking about details, we need to know more about what we're dealing with. That is – indexes.

What is an index?

Once, we used this question as a starter point on interviews for jobs (I was on the hiring “committee"). The biggest surprise came from the fact that very few people could answer it. Most people thought about index as being “numeric column in table, that has incrementing, unique values".

I am not big on encyclopedic definitions, so let me just write what I understand as index:

Index is, external to table, data structure, that can be used to speed up searches for values existing in the table.

In PostgreSQL, we have currently 4 different kinds of indexes, and out of these, I will discuss only one. There four are:

  • B-Tree – the most common type of index, default. This is the one I'll talk about.
  • Hash – potentially interesting index type, but with a fatal (for me) flaw of being not WAL logged – more details
  • GiST – Generalized Search Tree – these are custom indexes for specialized datatypes, and while they are great, it's not very common for most of dbas (in my opinion)
  • GIN – Generalized Inverted Index – another kind of custom indexes, mostly useful for TSearch.

Of all those, I will focus entirely on B-Tree. Reason is very simple – when you're using specialized datatypes and/or text searches – you usually know what you're doing, so this blogpost wouldn't help you. I'd like to help people who are at lost about what's what, and how to deal with indexing. And this is by default BTree.

B-Tree indexes, can be imagined (which helps) as sorted lists.

Let's assume you have a table with 1 million integers. Checking if you have given integer inside the table would require scanning whole table (worst case).

On the other hand – if, aside from the table, you also keep sorted list of those integers (with each element in this index/list containing also pointer to place where row with this integer is in table – finding given integer becomes trivial.

Let's see it on simple example:

Table data:

  Row # | column a (integer)
      1 |            8238320
      2 |            7321114
      3 |            3866827
      4 |            4875272
      5 |            6465694
      6 |             308890
      7 |               1568
      8 |            6477632
      9 |            3518089
     10 |             703321

Index data:

  Row # | column a (integer) | Row #
      1 |               1568 |     7
      2 |             308890 |     6
      3 |             703321 |    10
      4 |            3518089 |     9
      5 |            3866827 |     3
      6 |            4875272 |     4
      7 |            6465694 |     5
      8 |            6477632 |     8
      9 |            7321114 |     2
     10 |            8238320 |     1

Now, when we want to read table record “WHERE column_a = 3866827″ – we don't need to scan whole table, but we can just check in index.

Since index data is sorted, we can (it is simplification, but it's good enough to understand the idea):

  • start with middle element in index
  • check if value in given element in index is the one we're looking for
  • if it is – we're done.
  • if the found value is smaller than the value we're looking for – search in the 2nd half of index (from current place to end) – i.e. repeat all steps, including getting middle element, but on subset of index
  • if the found value is larger than the value we're looking for – search in the 1st half of index

This is all nice and good, but it shows only one part of very simple index – single column index, on integer value. And searching using “=" operator (find row which has column equal to something).

But our indexes can do much more.

Please note that when we'll finally find row with value 3866827 – we can very easily find rows smaller or larger from it – we don't have to do any sorting, because the data is already sorted.

Let me show you this using real queries:

$ create table test as
    select cast( random() * 1000000000 as integer) as i
    from generate_series(1,1000000);
SELECT 1000000

This created test table. And now we can see that searching for value in it takes time, regardless of whether the value exists:

$ explain analyze select * from test where i = 361243617;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..16925.00 rows=1 width=4) (actual time=0.015..96.439 rows=1 loops=1)
   Filter: (i = 361243617)
 Total runtime: 96.461 ms
(3 rows)

or not:

$ explain analyze select * from test where i = -1;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..16925.00 rows=1 width=4) (actual time=98.658..98.658 rows=0 loops=1)
   Filter: (i = (-1))
 Total runtime: 98.680 ms
(3 rows)

But when I'll add index:

$ create index idx on test (i );
CREATE INDEX

Situation changes:

$ explain analyze select * from test where i = 361243617;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using idx on test  (cost=0.00..8.38 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1)
   Index Cond: (i = 361243617)
 Total runtime: 0.032 ms
(3 rows)
$ explain analyze select * from test where i = -1;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Index Scan using idx on test  (cost=0.00..8.38 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=1)
   Index Cond: (i = (-1))
 Total runtime: 0.030 ms
(3 rows)

This showed the basics. Now, let see sorting, without sorting:

$ explain analyze select * from test where i >= 361243617 order by i asc limit 10;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.56 rows=10 width=4) (actual time=0.019..0.086 rows=10 loops=1)
   ->  Index Scan using idx on test  (cost=0.00..35886.74 rows=638546 width=4) (actual time=0.017..0.082 rows=10 loops=1)
         Index Cond: (i >= 361243617)
 Total runtime: 0.108 ms
(4 rows)

Please note lack of “Sort" element. Without index, it would look like:

$ explain analyze select * from test where i >= 361243617 order by i asc limit 10;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=30723.75..30723.77 rows=10 width=4) (actual time=192.021..192.023 rows=10 loops=1)
   ->  Sort  (cost=30723.75..32320.11 rows=638546 width=4) (actual time=192.019..192.021 rows=10 loops=1)
         Sort Key: i
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on test  (cost=0.00..16925.00 rows=638546 width=4) (actual time=0.012..129.238 rows=638256 loops=1)
               Filter: (i >= 361243617)
 Total runtime: 192.053 ms
(7 rows)

We can see that Scan on test table returns 638256 rows, which had to be sorted (well, top-N sorted, which is a bit different than full sort, but it's not relevant for now).

Exactly the same thing applies in any case where we can define > and < operators on data.

One note in here. When using “LIMIT x OFFSET y" – please note that index has to fetch x+y rows and then just skip first y. So getting, even with index, 10 rows, with offset 1000000 will be slow. For more discussion on this, including chart, please check my previous post.

Some data doesn't have sensible greater/smaller operators – for example – ltree or hstore. Sometimes the definition of comparison is debatable.

Anyway.

As we can see, we can easily search for = , < or > using our simple index.

But what happens when we'll use more complex situations? More columns?

Let's create new test situation:

$ create table test as
    select cast( random() * 100000 as integer) as i, cast( random() * 100 as integer ) as j
    from generate_series(1,10000000);
SELECT 1000000

Sample data:

$ select * from test limit 20;
   i   | j
-------+----
 63748 | 36
 37462 | 96
 25753 | 24
 89899 | 21
 26357 | 49
 67950 | 83
  4226 | 91
 51442 | 32
 29346 | 41
 86067 | 65
 47286 | 18
 55357 | 94
 11806 | 45
   663 | 91
 89186 | 11
  1200 | 53
 46942 | 39
 48841 | 73
 62640 | 39
 93524 | 89
(20 rows)

Now, our indexing options dramatically expand:

  • We can use two single-column indexes
  • We can use index on (i,j)
  • We can use index on (j,i)

Plus – sometimes it just so happens, that while j has 100 different values, we care only about rows with j = 10. Or, a bit more complex – j < 10.

Up until PostgreSQL 8.1, when scanning single table, PostgreSQL could use at most one index. Luckily today, we have so called “bitmap index scans" which means that we can use multiple indexes. Let's see how it works:

$ create index idx_i on test (i);
CREATE INDEX
$ create index idx_j on test (j);
CREATE INDEX

And now:

$ explain analyze select * from test where i = 63748 and j = 36;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=187.98..282.72 rows=25 width=8) (actual time=2.290..2.290 rows=1 loops=1)
   Recheck Cond: ((j = 36) AND (i = 63748))
   ->  BitmapAnd  (cost=187.98..187.98 rows=25 width=0) (actual time=2.277..2.277 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_j  (cost=0.00..93.86 rows=5000 width=0) (actual time=1.760..1.760 rows=9979 loops=1)
               Index Cond: (j = 36)
         ->  Bitmap Index Scan on idx_i  (cost=0.00..93.86 rows=5000 width=0) (actual time=0.009..0.009 rows=9 loops=1)
               Index Cond: (i = 63748)
 Total runtime: 2.321 ms
(8 rows)

As you can see both indexes were used.

What bitmap scanning really is?

PostgreSQL gets list of pages that will contain matching rows from index 1, and list of pages that will containing matching rows from index 2, and then checks which pages should be fetched by checking only pages that were returned from both indexes.

So, let's assume, that index idx_j returned that rows that will match it will be on pages:

  • 1
  • 10
  • 11
  • 99

and idx_i returned pages

  • 7
  • 11
  • 31

By doing bitmap “AND" operation on it, we know that only page #11 needs to be loaded from disk to check for data.

This is all great, but dedicated index will, most likely, be better. Let's see:

$ create index idx_ij on test (i, j);
CREATE INDEX
 
$ explain analyze select * from test where i = 63748 and j = 36;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Scan using idx_ij on test  (cost=0.00..8.38 rows=1 width=8) (actual time=0.030..0.031 rows=1 loops=1)
   Index Cond: ((i = 63748) AND (j = 36))
 Total runtime: 0.053 ms
(3 rows)

The time is much better. Reason is very simple – indexscan returns only 1 row, no further actions have to be done, just fetching the correct row from table to return.

Will order of columns matter?

$ create index idx_ji on test (j, i);
CREATE INDEX
 
$ explain analyze select * from test where i = 63748 and j = 36;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Scan using idx_ji on test  (cost=0.00..8.38 rows=1 width=8) (actual time=0.017..0.018 rows=1 loops=1)
   Index Cond: ((j = 36) AND (i = 63748))
 Total runtime: 0.043 ms
(3 rows)

Time is mostly the same.

But. Order of columns in index will matter when we'll want to use different operator than “=" for one of the columns.

Which might be (for example) sorting.

To test it nicely, I used larger table:

create table test as
    select
        cast( random() * 15000 as integer ) as i,
        cast( random() * 15000 as integer ) as j
    from generate_series(1,200000000);

Let's get some base stats:

$ select min(i), count(distinct i), max(i), min(j), count(distinct j), max(j), count(*) from test;
-[ RECORD 1 ]--
min   | 0
count | 15001
max   | 15000
min   | 0
count | 15001
max   | 15000
count | 200000000

On this (test) table I created two indexes, and then I made two copies of the table (test_ij and test_ji) with multicolumn indexes.

Let's say we'd want to get rows with highest i for some given j.

Query is simple:

select * from test where j = 50 order by i desc limit 3;

So, let's see how it will work with different indexes. First, two indexes:

$ explain analyze select * from test where j = 50 order by i desc limit 3;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=47594.60..47594.61 rows=3 width=8) (actual time=285.532..285.533 rows=3 loops=1)
   ->  Sort  (cost=47594.60..47627.13 rows=13014 width=8) (actual time=285.531..285.531 rows=3 loops=1)
         Sort Key: i
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on test  (cost=267.04..47426.40 rows=13014 width=8) (actual time=7.416..283.018 rows=13516 loops=1)
               Recheck Cond: (j = 50)
               Rows Removed by Index Recheck: 2190542
               ->  Bitmap Index Scan on idx_j  (cost=0.00..263.79 rows=13014 width=0) (actual time=5.549..5.549 rows=13516 loops=1)
                     Index Cond: (j = 50)
 Total runtime: 285.562 ms
(10 rows)

In here we see different kind of Bitmap scan, but it's irrelevant. Only one index has been used. It returned 13516 rows which had to be sorted to return 3 rows.

Now. Let's see in case of index on (i, j):

$ explain analyze select * from test_ij where j = 50 order by i desc limit 3;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..856.38 rows=3 width=8) (actual time=1.180..2.577 rows=3 loops=1)
   ->  Index Scan Backward using idx_ij on test_ij  (cost=0.00..3745802.69 rows=13122 width=8) (actual time=1.180..2.575 rows=3 loops=1)
         Index Cond: (j = 50)
 Total runtime: 2.602 ms
(4 rows)

Now, that's fast.

So, let's see the final index – on (j, i):

$ explain analyze select * from test_ji where j = 50 order by i desc limit 3;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..12.01 rows=3 width=8) (actual time=0.022..0.026 rows=3 loops=1)
   ->  Index Scan Backward using idx_ji on test_ji  (cost=0.00..52188.94 rows=13039 width=8) (actual time=0.021..0.024 rows=3 loops=1)
         Index Cond: (j = 50)
 Total runtime: 0.050 ms
(4 rows)

It's 50 times faster!

Why? Reason is very simple. Please note that in case of index on (i, j) order of reading rows from table gives you descending i, but within this, you have also descending j.

Which means that to get to 3 rows Pg had to:

  • get max i
  • check if there is row with max i and j = 50
  • get next largest i
  • repeat

On the other hand, when index is on (j, i), PostgreSQL quickly finds j = 50, and then, within it, it just reads top 3 nodes from index.

We can see clearly the problem by running the same query, but for more ids:

Index in “wrong" order:

$ explain analyze select * from test_ij where j = 50 order by i desc limit 1000;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..285459.74 rows=1000 width=8) (actual time=1.648..426.232 rows=1000 loops=1)
   ->  Index Scan Backward using idx_ij on test_ij  (cost=0.00..3745802.69 rows=13122 width=8) (actual time=1.647..425.964 rows=1000 loops=1)
         Index Cond: (j = 50)
 Total runtime: 426.342 ms
(4 rows)

And when index is on (j, i):

$ explain analyze select * from test_ji where j = 50 order by i desc limit 1000;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4002.53 rows=1000 width=8) (actual time=0.017..0.950 rows=1000 loops=1)
   ->  Index Scan Backward using idx_ji on test_ji  (cost=0.00..52188.94 rows=13039 width=8) (actual time=0.017..0.809 rows=1000 loops=1)
         Index Cond: (j = 50)
 Total runtime: 1.031 ms
(4 rows)

Another approach might be, to use so called partial indexes.

Partial indexes are indexes which cover not all, but just some subset of rows in a table.

For example. Let's assume we care only about rows with j = 50. We can do:

create index partial_i on test (i) where j = 50;

And now:

$ explain analyze select * from test where j = 50 order by i desc limit 1000;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4000.74 rows=1000 width=8) (actual time=0.016..1.058 rows=1000 loops=1)
   ->  Index Scan Backward using partial_i on test  (cost=0.00..52065.70 rows=13014 width=8) (actual time=0.016..0.896 rows=1000 loops=1)
 Total runtime: 1.151 ms
(3 rows)

The biggest selling point of partial indexes is that they are cheaper. For example, let's consider sizes:

  relname  | pg_size_pretty
-----------+----------------
 idx_i     | 4284 MB
 idx_j     | 4284 MB
 idx_ij    | 4284 MB
 idx_ji    | 4284 MB
 partial_i | 312 kB

Reason is very simple – less rows to be indexes – smaller index. This makes all updates of values in it faster.

Additional benefit of partial indexes, is that you can have partial uniqueness. Typical example is unique username, but only for active users, which can be trivially done with:

CREATE UNIQUE INDEX ui_username ON users ( username ) WHERE status = 'active';

Let's imagine we want to search using two inequality operations. Like:

select * from test where i > 14900 and j < 100;

Times to run it with various indexes:

$ explain analyze select * from test where i > 14900 and j < 100;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=48191.55..79652.52 rows=8493 width=8) (actual time=532.949..14410.958 rows=8789 loops=1)
   Recheck Cond: ((j < 100) AND (i > 14900))
   Rows Removed by Index Recheck: 119778817
   ->  BitmapAnd  (cost=48191.55..48191.55 rows=8493 width=0) (actual time=531.835..531.835 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_j  (cost=0.00..23892.09 rows=1292521 width=0) (actual time=240.616..240.616 rows=1327963 loops=1)
               Index Cond: (j < 100)
         ->  Bitmap Index Scan on idx_i  (cost=0.00..24294.96 rows=1314237 width=0) (actual time=257.556..257.556 rows=1327155 loops=1)
               Index Cond: (i > 14900)
 Total runtime: 14411.996 ms
(9 rows)
$ explain analyze select * from test_ij where i > 14900 and j < 100;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_ij  (cost=28280.35..62276.90 rows=9211 width=8) (actual time=64.067..96.207 rows=8789 loops=1)
   Recheck Cond: ((i > 14900) AND (j < 100))
   ->  Bitmap Index Scan on idx_ij  (cost=0.00..28278.05 rows=9211 width=0) (actual time=62.483..62.483 rows=8789 loops=1)
         Index Cond: ((i > 14900) AND (j < 100))
 Total runtime: 96.904 ms
(5 rows)
$ explain analyze select * from test_ji where i > 14900 and j < 100;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_ji  (cost=30580.85..65451.73 rows=9460 width=8) (actual time=62.709..92.195 rows=8789 loops=1)
   Recheck Cond: ((j < 100) AND (i > 14900))
   ->  Bitmap Index Scan on idx_ji  (cost=0.00..30578.49 rows=9460 width=0) (actual time=61.127..61.127 rows=8789 loops=1)
         Index Cond: ((j < 100) AND (i > 14900))
 Total runtime: 92.673 ms
(5 rows)

In here we see that the times with two column indexes are not that bad, but this is mostly because we have very low number of distinct values. If the number was higher, the time would dramatically increase.

Using normal BTree indexes, there is no simple solution for this. We could try to add functional indexes on (for example) ( i/1000, j/1000) and modify the queries accordingly.

For such cases, the most sensible approach is to use GiST indexes, for example like this:

create index gist_idx on test using gist (box(point(i,j), point(i,j)));

This looks a bit complex, so let's deconstruct it.

The index is created not on columns i,j, but rather on a geometric box. Geometric boxes (rectangles) are defined by two corners – points. First point is located (on x/y grid) in (i,j), and the second point in (i,j) – So this is basically a rectangle with 0 width and height, just located in specific point.

Why so complex thing? Reason is simple – PG GiST indexes let us search for geometries quite fast. Like this:

$ explain analyze
SELECT * FROM test
    WHERE box( point(i, j), point(i, j) ) && '(14901,0),(999999999,99)'::box;
                                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=50535.67..1007566.98 rows=1000000 width=8) (actual time=38.130..52.140 rows=8789 loops=1)
   Recheck Cond: (box(point((i)::double precision, (j)::double precision), point((i)::double precision, (j)::double precision)) && '(999999999,99),(14901,0)'::box)
   ->  Bitmap Index Scan on gist_idx_2  (cost=0.00..50285.67 rows=1000000 width=0) (actual time=37.367..37.367 rows=8789 loops=1)
         Index Cond: (box(point((i)::double precision, (j)::double precision), point((i)::double precision, (j)::double precision)) && '(999999999,99),(14901,0)'::box)
 Total runtime: 52.576 ms
(5 rows)

So, to wrap it:

  • Indexes will let you quickly find rows based on “=" condition
  • Indexes have the ability to quickly return rows from given point forward or backward (>>= 1000 or <= 900)
  • Indexes can't easily “skip" rows (offset x)
  • When creating index for 2 columns – first put the column that will be used with “=" condition
  • Consider and/or test usage of partial indexes
  • If all else fails – there are functional indexes, BTree and other, which still can help.

  1. 9 comments

  2. # Alexander Korotkov
    Sep 28, 2011

    Since 9.0 we can build GiST indexes just on points. Expression would be simplier a bit, however index size is same.

  3. Sep 28, 2011

    @Alexander:
    thanks a lot, missed that.

  4. # Leonardo
    Sep 28, 2011

    can you elaborate on why the gist index is faster in that case?

  5. Sep 28, 2011

    @Leonardo:
    Gist indexes for geometric types are emulating internally structure called “RTree” – http://en.wikipedia.org/wiki/Rtree – which is very fast.
    Details – sorry, way over my skill level.
    The drawback of Btree in here, is that while finding all rows with “i > 4900″ is simple, adding condition on “j < 100″ is actually quite complex – mostly because there is no correlation between those two columns.

  6. # Alexander Korotkov
    Sep 28, 2011

    R-tree is faster than B-tree for two- and more dimensional range search when selectivity in various dimensions is similar. When selectivity of one dimension is much less than others then B-tree is faster than R-tree.
    If different indexes (B-tree and GiST) are preset, we have to select possible index in query. It would be nice if planner can select index for particular search, but it seems to be hard problem.

  7. Sep 28, 2011

    @Alexander: Solution is actually very simple, but there is strong (and totally unreasonable, in my opinion) resistance against it. The solution are hints.

  8. # Leonardo
    Sep 29, 2011

    http://www.sai.msu.su/~megera/wiki/plantuner

    this can turn on / off indexes (never used it myself though)

  9. # Manav Goel
    Oct 10, 2011

    Thanks for very interesting and useful post. I have couple of questions though.

    In case where use case require search on either i or j or (i,j) both.
    Index on (i,j) will work for both searching based on just i and (i,j) both.
    But for searching based on j it will not work. So what will be more prudent in this case – an index on j alone with index on (i,j) or index (j,i) and (i,j).?

    Is searching based on i with index(i,j) will be slow as compare to the case when index is only on i?

  10. Oct 10, 2011

    @Manav:
    assuming both conditions use “=” operator, I would create indexes on:
    - (i,j)
    - (j)

    As for searching using “i=..” using index on (i,j) – it will be slighly slower than using index on (i), but usually the difference is negligible.

Leave a comment