One of things people learn is that adding indexes isn't free. All write operations (insert, update, delete) will be slower – well, they have to update index.
But realistically – how much slower?
Full tests should involve lots of operations, on realistic data, but I just wanted to see some basic info. So I figured I'll just look at speed of inserting data (well, COPYing data), and will try to extract some knowledge from it…
For this test I used Pg 18, and very simple, though purposely wide-ish table:
=$ create table overhead ( id int8, c1 int8, c2 int8, c3 int8, … c20 int8 );
So, 21 columns, each int8, all simple, and easy to follow.
Now, I needed some data. Luckily I can trivially generate lots of random numbers. The thing is that I wanted the tests to be comparable, so I pregenerated input data. One million rows made with this script:
#!/usr/bin/env ruby # frozen_string_literal: true File.open("test.data", "w") do |f| 1000000.times do |i| cols = [i+1] 20.times do cols << rand(1000000) end f.write cols.join("\t") + "\n" end end
Each column c* got random value in range 0..999999, and id was simply sequence from 1 to million. Easy peasy.
Then I made 22 tests:
- With no indexes
- With just unique index on id
- With unique index on id, and normal on c1
- … id, c1, c2
- With 21 indexes: unique on id, and 20 indexes on all c* columns
Each test did:
- drop test table
- recreate it, empty
- add indexes for this test
- time loading data
Not really all-encapsulating, but I think it's a good start.
After all tests I got some initial data.
First, let's look at sizes of datasets.
Table itself, without indexes, takes ~191MB. Each index is slightly different, but their sizes are all in the range of 21 to 25MB. So, total size, in easiest case with no indexes, was 191MB, and with all indexes 689MB.
Now, let's see the times. Given that I suspect that times will be related to total size, I will also include total-size-of-table-with-indexes divided by time:
| Index count: | Total table size (MB): | Time to load (ms): | Load speed (kB/s): |
|---|---|---|---|
| 0 | 190.8 | 2499.748 | 78,139.08 |
| 1 | 212.2 | 3551.826 | 61,176.42 |
| 2 | 236.2 | 6475.024 | 37,353.38 |
| 3 | 259.7 | 9606.983 | 27,680.70 |
| 4 | 284.4 | 12,922.897 | 22,535.50 |
| 5 | 307.7 | 16,780.612 | 18,776.91 |
| 6 | 332.4 | 19,768.661 | 17,217.55 |
| 7 | 355.9 | 23,122.382 | 15,763.43 |
| 8 | 379.3 | 27,197.468 | 14,279.27 |
| 9 | 402.7 | 30,903.665 | 13,344.95 |
| 10 | 427.3 | 35,031.484 | 12,489.34 |
| 11 | 451.1 | 38,324.698 | 12,051.76 |
| 12 | 474.7 | 41,829.256 | 11,621.72 |
| 13 | 498.2 | 46,624.808 | 10,941.30 |
| 14 | 521.2 | 49,759.981 | 10,725.57 |
| 15 | 544.8 | 53,908.810 | 10,349.03 |
| 16 | 569.1 | 57,908.788 | 10,062.72 |
| 17 | 593.8 | 61,966.138 | 9,812.84 |
| 18 | 617.2 | 65,023.413 | 9,720.31 |
| 19 | 641.1 | 69,257.173 | 9,478.41 |
| 20 | 664.8 | 73,977.954 | 9,202.63 |
| 21 | 688.5 | 77,731.259 | 9,069.40 |
(speed relates to loaded size, not size of input, which is somewhat counter-intuitive, but I wanted to check how it relates).
So, we have some data. What do we see?
- speed definitely decreases
- the decrease is not linear with total size of table + indexes. it's worse.
- with ~3.6x increase in size, we see 8 times slower performance
Let's test two more things to have better idea on what's going on.
First, just one index, but very wide – on 10 columns (c1..c10):
- this new index size is 129MB
- total size of table, with this index: 319MB
- time to load data: 6,512.281 ms
- speed of loading: 50,202.99 kB/s
So, it was actually significantly faster than with 10 separate indexes. Of course both cases solve different problems computationally, so it's not real apples to apples comparison.
Now, let's see what will happen if I'll make the indexes partial. Will redo test with all 21 indexes, but (with the exception of unique on id), will make them all index just 5% of rows, using where column <= 50000 condition.
Test data:
- Each of c* indexes is now only ~ 1.1MB
- total size of table, with all indexes: 235MB
- time to load data: 6,916.112 ms
- speed of loading: 34,851.95 kB/s
Time/speed is almost the same as in case with 2 indexes, which kinda makes sense – these two indexes had: 1 million ids in id index, and 1 million c1 values. In case of newest test, we had 21 indexes, but 20 of them contained only ~ 50k rows.
So, there we go. We have some data. From my point of view it means that we can add indexes without overly worrying about performance, as long as:
- index significantly improves some common operation
- your db has (much) more common this operation that it being optimized, than writes to the table that gets new index
- partial indexes are your friends – they are smaller, and faster
- usage of multicolumn index seems to be better than multiple indexes, but, of course, it can't be, used, for queries that filter only on trailing (in index definition) columns
And finally WARNING – all this applies to btree indexes only. Other indexes would require different tests, and some of them are very expensive to update (GIN, for example).
So, while I think it's worthy to add specialized, partial, btree index, to speed up query that happens 100 times a minute, adding more and more indexes, especially expensive ones, will eventually lead to problems.
Now, you might want to say: OK, but how to I know if the index I have in mind is expensive? That's simple: make test table, load 100k rows, truncate, add index, load the same rows.
To show you simple example:
=$ \timing =$ create table test_gin (x text); =$ \copy test_gin from texts.copy =$ truncate test_gin ; =$ create index on test_gin using gin ( to_tsvector('english', x )); =$ \copy test_gin from texts.copy
texts.copy is a file with 100k lines, where each line is ~2500 characters of random english words.
So, sizing information:
- table size, on its own, but with TOAST: 270MB
- index size: 69 MB
- total size, with index: 339MB
Now, without index, it took 6,928.514 ms to load the data. With index, though, the time jumped to 109,804.609 ms. So, total data size increased by only ~ 25%, but time increased by over 15 times.
So, your mileage my vary, do your tests, but don't be afraid to add small btree index every now and then 🙂