Hypothetical indexes – select * from depesz;

There exists an extension to PostgreSQL, which lets you use hypothetical indexes.

What are there? That's simple – these are indexes that don't really exist. So what good are they?

Let's see.

First, I will need some test data. That will be relatively simple:

$ create table test_data (
    id serial primary key,
    some_timestamp timestamptz,
    some_integer int4,
    other_integer int4,
    another_integer int4
);
$ insert into test_data (some_timestamp, some_integer, other_integer, another_integer)
    select
        now() - random() * '10 years'::interval,
        random() * 10,
        random() * 100,
        random() * 1000
    from
        generate_series(1,100000000);

100 million records, but the table is relatively small:

\dt+ test_data
                      List of relations
 Schema |   Name    | Type  | Owner  |  Size   | Description 
--------+-----------+-------+--------+---------+-------------
 public | test_data | table | depesz | 5746 MB | 
(1 row)

Now, let's assume we're considering adding index on (some_integer, other_integer), but we'd like to know if it would be used for queries that look for other_integer only.

Normally, I'd have to, on some side table/database:

$ create index test_index on test_data (some_integer, other_integer);
CREATE INDEX
Time: 243084.367 ms
 
$ explain select * from test_data where other_integer = 50;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on test_data  (cost=0.00..1985295.00 rows=933401 width=24)
   Filter: (other_integer = 50)
(2 rows)
 
Time: 19.342 ms

The answer is “no, it will not use such index" – but the index creation was long, and pretty expensive in terms of IO and CPU.

Thanks to hypopg, I should be able to get the same answer in fraction of the time. So let's see how it works (I of course dropped this test index).

To install, I need to:

sudo apt-get install pgxnclient
pgxn install –unstable hypopg

First step install pgxn tool, which is then used to install actual utility.

To have it working, I have to have certain packages installed (basically pg headers), but this should be trivial to do by appropriate “apt-get install" or “yum install".

Installation took couple of seconds. Afterwards, I can:

$ create extension hypopg;
CREATE EXTENSION

Which loads the code to my test database.

Now, sanity check that test_data table is without this additional index:

\d test_data
                                      Table "public.test_data"
     Column      |           Type           |                       Modifiers                        
-----------------+--------------------------+--------------------------------------------------------
 id              | integer                  | not null default nextval('test_data_id_seq'::regclass)
 some_timestamp  | timestamp with time zone | 
 some_integer    | integer                  | 
 other_integer   | integer                  | 
 another_integer | integer                  | 
Indexes:
    "test_data_pkey" PRIMARY KEY, btree (id)

OK. With this in place, I can create the hypothetical index. This is done using function call:

$ select hypopg_create_index('create index on test_data (some_integer, other_integer)');
                    hypopg_create_index                    
-----------------------------------------------------------
 (16719,<16719>btree_test_data_some_integer_other_integer)
(1 row)

This took ~ 0.5s.

Now, I can try the explain again:

$ explain select * from test_data where other_integer = 50;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Seq Scan on test_data  (cost=0.00..1985295.10 rows=1210000 width=24)
   Filter: (other_integer = 50)
(2 rows)

Index is not used. But perhaps I did something wrong? let's try to make index what will definitely be used:

$ select hypopg_create_index('create index on test_data (some_integer, other_integer, another_integer)');
                           hypopg_create_index                           
-------------------------------------------------------------------------
 (16721,<16721>btree_test_data_some_integer_other_integer_another_integ)
(1 row)
 
$ explain select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using <16721>btree_test_data_some_integer_other_integer_another_integ on test_data  (cost=0.07..464.07 rows=115 width=24)
   Index Cond: ((some_integer = 5) AND (other_integer = 50) AND (another_integer = 500))
(2 rows)

Hypopg works (from what I can tell) by modifying system catalog (information about indexes), but only for current backend. So whatever you're testing in your connection – it will not influence other, “normal" connections.

Also, as soon as you close your connection – the hypothetical indexes you created are gone. Of course you can also drop them manually (by calling hypopg_drop_index() function), but I would say that generally there is no point in doing it – since it all will be cleaned as soon as you'll disconnect.

All in all, it looks great, and while in a lot of cases it's trivial to tell if the index will be used, for all the not-so-obvious cases, hypopg will be great tool.

It would be nice if it actually works when i run the query. I tried it with the both types of index and the results were not acceptable.

hypo- index:

explain ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+—————————————————————————————————————————-+
| QUERY PLAN |
+—————————————————————————————————————————-+
| Seq Scan on pg_temp_2.test_data (cost=0.00..2485295.00 rows=13 width=24) (actual time=710.865..10830.540 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Filter: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Rows Removed by Filter: 99999909 |
| Planning time: 0.070 ms |
| Execution time: 10830.618 ms |
+—————————————————————————————————————————-+

Standard index
EXPLAIN ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+————————————————————————————————————————————+
| QUERY PLAN |
+————————————————————————————————————————————+
| Index Scan using test_index on pg_temp_2.test_data (cost=0.57..56.86 rows=13 width=24) (actual time=1.961..8.133 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Index Cond: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Planning time: 0.091 ms |
| Execution time: 8.172 ms |
+————————————————————————————————————————————+

5 thoughts on “Hypothetical indexes”

VdP says:

2015-09-08 at 11:03

Nice 🙂 How does PG react when you actually run a query that tries to use an hypothetical index ?
Ay says:

2015-09-08 at 11:32

It would be nice if it actually works when i run the query. I tried it with the both types of index and the results were not acceptable.

hypo- index:

explain ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+—————————————————————————————————————————-+
| QUERY PLAN |
+—————————————————————————————————————————-+
| Seq Scan on pg_temp_2.test_data (cost=0.00..2485295.00 rows=13 width=24) (actual time=710.865..10830.540 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Filter: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Rows Removed by Filter: 99999909 |
| Planning time: 0.070 ms |
| Execution time: 10830.618 ms |
+—————————————————————————————————————————-+

Standard index
EXPLAIN ANALYZE VERBOSE select * from test_data where some_integer = 5 and other_integer = 50 and another_integer = 500;
+————————————————————————————————————————————+
| QUERY PLAN |
+————————————————————————————————————————————+
| Index Scan using test_index on pg_temp_2.test_data (cost=0.57..56.86 rows=13 width=24) (actual time=1.961..8.133 rows=91 loops=1) |
| Output: id, some_timestamp, some_integer, other_integer, another_integer |
| Index Cond: ((test_data.some_integer = 5) AND (test_data.other_integer = 50) AND (test_data.another_integer = 500)) |
| Planning time: 0.091 ms |
| Execution time: 8.172 ms |
+————————————————————————————————————————————+
depesz says:

2015-09-08 at 12:48

I’m not sure how to respond to your questions. Pg cannot run queries using hypothetical indexes because they don’t exist. So when you actually run a query, for example by using explain analyze, it’s being run as if the index didn’t exist. Because it doesn’t.
Thomas says:

2015-09-08 at 13:33

Yes, the hypothetical indexes only exists when a simple EXPLAIN is called. If you use ANALYZE, the hypothetical indexes are simply skipped.
In fact, the hypothetical are not directly stored in the catalog but in memory. A hook on explain adds them to the planner’s list of indexes when called with a simple EXPLAIN. See the hypo_get_relation_info_hook function for more details.
vdp says:

2015-09-08 at 15:59

Ah that’s neat, explain statements and normal statements get a different view of the list of indexes, so there’s no error, thanks.

Comments are closed.