What is index overhead on writes?

One of things people learn is that adding indexes isn't free. All write operations (insert, update, delete) will be slower – well, they have to update index.

But realistically – how much slower?

Full tests should involve lots of operations, on realistic data, but I just wanted to see some basic info. So I figured I'll just look at speed of inserting data (well, COPYing data), and will try to extract some knowledge from it…

For this test I used Pg 18, and very simple, though purposely wide-ish table:

=$ create table overhead (
    id int8,
    c1 int8,
    c2 int8,
    c3 int8,
…
    c20 int8
);

So, 21 columns, each int8, all simple, and easy to follow.

Now, I needed some data. Luckily I can trivially generate lots of random numbers. The thing is that I wanted the tests to be comparable, so I pregenerated input data. One million rows made with this script:

#!/usr/bin/env ruby
# frozen_string_literal: true
 
File.open("test.data", "w") do |f|
  1000000.times do |i|
    cols = [i+1]
    20.times do
      cols << rand(1000000)
    end
    f.write cols.join("\t") + "\n"
  end
end

Each column c* got random value in range 0..999999, and id was simply sequence from 1 to million. Easy peasy.

Then I made 22 tests:

With no indexes
With just unique index on id
With unique index on id, and normal on c1
… id, c1, c2
With 21 indexes: unique on id, and 20 indexes on all c* columns

Each test did:

drop test table
recreate it, empty
add indexes for this test
time loading data

Not really all-encapsulating, but I think it's a good start.

After all tests I got some initial data.

First, let's look at sizes of datasets.

Table itself, without indexes, takes ~191MB. Each index is slightly different, but their sizes are all in the range of 21 to 25MB. So, total size, in easiest case with no indexes, was 191MB, and with all indexes 689MB.

Now, let's see the times. Given that I suspect that times will be related to total size, I will also include total-size-of-table-with-indexes divided by time:

Index count:	Total table size (MB):	Time to load (ms):	Load speed (kB/s):
0	190.8	2499.748	78,139.08
1	212.2	3551.826	61,176.42
2	236.2	6475.024	37,353.38
3	259.7	9606.983	27,680.70
4	284.4	12,922.897	22,535.50
5	307.7	16,780.612	18,776.91
6	332.4	19,768.661	17,217.55
7	355.9	23,122.382	15,763.43
8	379.3	27,197.468	14,279.27
9	402.7	30,903.665	13,344.95
10	427.3	35,031.484	12,489.34
11	451.1	38,324.698	12,051.76
12	474.7	41,829.256	11,621.72
13	498.2	46,624.808	10,941.30
14	521.2	49,759.981	10,725.57
15	544.8	53,908.810	10,349.03
16	569.1	57,908.788	10,062.72
17	593.8	61,966.138	9,812.84
18	617.2	65,023.413	9,720.31
19	641.1	69,257.173	9,478.41
20	664.8	73,977.954	9,202.63
21	688.5	77,731.259	9,069.40

(speed relates to loaded size, not size of input, which is somewhat counter-intuitive, but I wanted to check how it relates).

So, we have some data. What do we see?

speed definitely decreases
the decrease is not linear with total size of table + indexes. it's worse.
with ~3.6x increase in size, we see 8 times slower performance

Let's test two more things to have better idea on what's going on.

First, just one index, but very wide – on 10 columns (c1..c10):

this new index size is 129MB
total size of table, with this index: 319MB
time to load data: 6,512.281 ms
speed of loading: 50,202.99 kB/s

So, it was actually significantly faster than with 10 separate indexes. Of course both cases solve different problems computationally, so it's not real apples to apples comparison.

Now, let's see what will happen if I'll make the indexes partial. Will redo test with all 21 indexes, but (with the exception of unique on id), will make them all index just 5% of rows, using where column <= 50000 condition.

Test data:

Each of c* indexes is now only ~ 1.1MB
total size of table, with all indexes: 235MB
time to load data: 6,916.112 ms
speed of loading: 34,851.95 kB/s

Time/speed is almost the same as in case with 2 indexes, which kinda makes sense – these two indexes had: 1 million ids in id index, and 1 million c1 values. In case of newest test, we had 21 indexes, but 20 of them contained only ~ 50k rows.

So, there we go. We have some data. From my point of view it means that we can add indexes without overly worrying about performance, as long as:

index significantly improves some common operation
your db has (much) more common this operation that it being optimized, than writes to the table that gets new index
partial indexes are your friends – they are smaller, and faster
usage of multicolumn index seems to be better than multiple indexes, but, of course, it can't be, used, for queries that filter only on trailing (in index definition) columns

And finally WARNING – all this applies to btree indexes only. Other indexes would require different tests, and some of them are very expensive to update (GIN, for example).

So, while I think it's worthy to add specialized, partial, btree index, to speed up query that happens 100 times a minute, adding more and more indexes, especially expensive ones, will eventually lead to problems.

Now, you might want to say: OK, but how to I know if the index I have in mind is expensive? That's simple: make test table, load 100k rows, truncate, add index, load the same rows.

To show you simple example:

=$ \timing
=$ create table test_gin (x text);
=$ \copy test_gin from texts.copy
=$ truncate test_gin ;
=$ create index on test_gin using gin ( to_tsvector('english', x ));
=$ \copy test_gin from texts.copy

texts.copy is a file with 100k lines, where each line is ~2500 characters of random english words.

So, sizing information:

table size, on its own, but with TOAST: 270MB
index size: 69 MB
total size, with index: 339MB

Now, without index, it took 6,928.514 ms to load the data. With index, though, the time jumped to 109,804.609 ms. So, total data size increased by only ~ 25%, but time increased by over 15 times.

So, your mileage my vary, do your tests, but don't be afraid to add small btree index every now and then 🙂

2 thoughts on “What is index overhead on writes?”

Jan Bilek says:

2026-01-16 at 04:40

We’ve been seeing a very similar effect in a high-write (payments settlement) workflow, and the biggest win for bulk loads was simply changing when we pay the index maintenance cost.

Our current pattern for large ingest batches is:
– Create the target table without secondary indexes (and often without FKs/constraints that would force immediate index maintenance).
– COPY/load the batch.
– Create the indexes afterwards (CREATE INDEX …).
– Run ANALYZE so the planner has fresh stats immediately.

In our case, “load first, index later” is substantially faster than maintaining many btree indexes row-by-row during COPY, and it scales much better as the index count grows. It’s essentially paying the sorting/build cost once per index, instead of per-row random-ish maintenance across many indexes.

One thingy we ran into – for predictable plans right after ingest, the ANALYZE step matters a lot; otherwise queries can look “randomly” slow until auto-analyze catches up.

Curious if you’ve tried the “bulk load -> build indexes” variant on the same dataset, and how the curve looks compared to maintaining indexes during COPY.
depesz says:

2026-01-16 at 15:57

@Jan Bilek:
I haven’t tested it for two reasons:
1. it’s well known that loading data to table with no indexes, and adding indexes later is faster.
2. this is (in my opinion) VERY fringe case. Sure, important, when you start working on something, or work on data import/transformation. But these cases are (in majority of cases) not *normal* usecase. Normal usecase is “apps are working on a database, they have to query data (so they use indexes), and we add some more data while it is all happening”.

So, sure, there are ways to optimize: “let’s load data to tables, when nothing is touching these tables, so we can do whatever we want with it”, but it’s not all that interesting to me, sorry.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

=$
|

2 thoughts on “What is index overhead on writes?”

Leave a Reply