December 6th, 2014 by depesz | Tags: , , , , , , | 8 comments »
Did it help? If yes - maybe you can help me?

The general knowledge is that numerics are slower than integers/float, but offer precision and ranges that are better.

While I understand what is slow, I don't really know how much slower numerics are. So let's test it.

There are couple of things to consider:

  • table size (related to disk usage per column)
  • speed of inserts
  • speed of indexing
  • searching using equality operator
  • searching using range operator
  • doing some math on the values

Table size. Initial thinking is that int2 column would make the table 4 times smaller than int8. But is it true?

First, let's consider very simplistic table:

create table test_type (
    some_column type
);

where type is one of:

  • int2
  • int4
  • int8
  • float4
  • float8
  • numeric
  • numeric(10,0) (in this case table name will be test_numeric10)
  • numeric(20,0) (in this case table name will be test_numeric20)

In each of these tables, let's insert 1 million rows. In all but int2 case these will be unique, but int2 range is too small to fit 1 million unique values, so I'll end up with duplicates:

insert into test_int2 select i % 32767 from generate_series(1,1000000) i;
insert into test_int4 select i from generate_series(1,1000000) i;
insert into test_int8 select i from generate_series(1,1000000) i;
insert into test_float4 select i from generate_series(1,1000000) i;
insert into test_float8 select i from generate_series(1,1000000) i;
insert into test_numeric select i from generate_series(1,1000000) i;
insert into test_numeric10 select i from generate_series(1,1000000) i;
insert into test_numeric20 select i from generate_series(1,1000000) i;

Table sizes:

select relname, pg_size_pretty( pg_table_size(oid) )
from pg_class where relname ~ '^test_';
    relname     | pg_size_pretty 
----------------+----------------
 test_float4    | 35 MB
 test_float8    | 35 MB
 test_int2      | 35 MB
 test_int4      | 35 MB
 test_int8      | 35 MB
 test_numeric   | 35 MB
 test_numeric10 | 35 MB
 test_numeric20 | 35 MB
(8 rows)

Surprise, surprise.

But, let's redo the test, this time with 2 columns. Each table will get “some_column2", of the same datatype it had before. and all the data will be reinserted, to both columns (tables will get dropped and recreated to avoid any bloat). How will that change?

insert into test_int2 select i % 32767, i % 32767 from generate_series(1,1000000) i;
insert into test_int4 select i, i from generate_series(1,1000000) i;
insert into test_int8 select i, i from generate_series(1,1000000) i;
insert into test_float4 select i, i from generate_series(1,1000000) i;
insert into test_float8 select i, i from generate_series(1,1000000) i;
insert into test_numeric select i, i from generate_series(1,1000000) i;
insert into test_numeric10 select i, i from generate_series(1,1000000) i;
insert into test_numeric20 select i, i from generate_series(1,1000000) i;

and the sizes now:

select relname, pg_size_pretty( pg_table_size(oid) )
from pg_class where relname ~ '^test_';
    relname     | pg_size_pretty 
----------------+----------------
 test_float4    | 35 MB
 test_float8    | 42 MB
 test_int2      | 35 MB
 test_int4      | 35 MB
 test_int8      | 42 MB
 test_numeric   | 42 MB
 test_numeric10 | 42 MB
 test_numeric20 | 42 MB
(8 rows)

Yeah. So, basically what we are seeing here is effect of two things:

  • alignment – data is aligned to certain number of bytes. In my case – 8 (that's why int2 and int4 take as much as int8, but when I added next column – int8 grew and int2/int4 didn't). This means that if something is using less than 8 bytes, then it will be “padded" to 8 bytes. It doesn't mean that each value will take 8 bytes – it's a bit more complex than that, and if you're looking for specific details – I guess sources have definitive answers
  • what a lot of people don't seem to remember – each row contains more than what you just see. There are system columns. And these take 26 bytes. Per row.

So, as far as data size is concerned – there is difference, but it's not really all that big.

What about inserting speed? I'll exclude int2 from the tests, as its smaller range requires limiting range, and this makes the test more complicated (i.e. there are additional operations to do).

So, what about the other datatypes?

I inserted 1 million rows into each of the tables, between tests doing “truncate" on the table, and repeated the process 10 times to get sensible average. Results:

table time more than fastest
test_int4 1285.952ms 0.51%
test_int8 1304.867ms 1.99%
test_float4 1284.393ms 0.39%
test_float8 1279.404ms 0.00%
test_numeric 1533.078ms 19.83%
test_numeric10 1553.848ms 21.45%
test_numeric20 1550.829ms 21.21%

That's pretty interesting. I also ran the test, where there was unique index on each of the tables (only on one column). Results for the same insert test with index in place:

table time more than fastest
test_int4 2716.123ms 0.00%
test_int8 2927.931ms 7.80%
test_float4 3079.118ms 13.36%
test_float8 3208.430ms 18.13%
test_numeric 3915.411ms 44.15%
test_numeric10 3925.478ms 44.53%
test_numeric20 3909.128ms 43.92%

Clearly indexing has its toll on larger datatypes. Up to ~ 50% slower than int4.

So, we know now, what about writes. Let's see how searches perform.

For equality searching, I will search, using 10 different queries for values (in indexed column): 1, 100000, 200000, 300000 … 1000000.

Since the searches are fast, I repeated each 1000 times. Averages:

table time more than fastest
test_int4 0.205ms 0.00%
test_int8 0.205ms 0.08%
test_float4 0.210ms 2.50%
test_float8 0.210ms 2.39%
test_numeric 0.208ms 1.68%
test_numeric10 0.207ms 1.19%
test_numeric20 0.207ms 0.73%

For me it means that there is no difference.

Now range queries. I'll check queries that return 10 rows, 1000 rows and 10000 rows.

All queries were in form of:

select * from table where some_column between ... and ...

First the 10 row:

table time more than fastest
test_int4 0.235ms 0.49%
test_int8 0.233ms 0.00%
test_float4 0.249ms 6.81%
test_float8 0.249ms 6.75%
test_numeric 0.248ms 6.29%
test_numeric10 0.247ms 5.66%
test_numeric20 0.245ms 5.02%

1000 rows:

test_int4 : 1.294ms (+ 0.00%)
test_int8 : 1.373ms (+ 6.05%)
test_float4 : 2.537ms (+ 96.03%)
test_float8 : 2.561ms (+ 97.86%)
test_numeric : 1.528ms (+ 18.04%)
test_numeric10 : 1.510ms (+ 16.65%)
test_numeric20 : 1.505ms (+ 16.26%)
table time more than fastest

And the final, 10k row test:

table time more than fastest
test_int4 8.401ms 0.00%
test_int8 9.366ms 11.49%
test_float4 17.959ms 113.77%
test_float8 18.196ms 116.60%
test_numeric 8.954ms 6.59%
test_numeric10 10.072ms 19.89%
test_numeric20 10.106ms 20.30%

That's actually pretty interesting – searching using floats is slower than using numerics?!

And now for some math. I ran a test, which was running commands like:

select 12::int2 * 13::int2

Decided to test 3 operators: addition, multiplication and modulo.

Results for addition test (123 + 456):

table time more than fastest
int4 0.162ms 0.00%
int8 0.165ms 2.18%
float4 0.168ms 4.11%
float8 0.168ms 4.03%
numeric 0.168ms 3.76%
numeric(10,0) 0.177ms 9.38%
numeric(20,0) 0.176ms 8.92%

Multiplication (123 * 99):

table time more than fastest
int4 0.159ms 0.00%
int8 0.163ms 2.25%
float4 0.166ms 4.31%
float8 0.166ms 4.17%
numeric 0.165ms 4.02%
numeric(10,0) 0.174ms 9.62%
numeric(20,0) 0.173ms 9.12%

Modulo (12345 % 9876). This test couldn't be done with floats, as there is no “float % float" operator:

table time more than fastest
int4 0.177ms 0.00%
int8 0.180ms 1.82%
numeric 0.183ms 3.56%
numeric(10,0) 0.193ms 8.77%
numeric(20,0) 0.193ms 8.74%

So, what does it all tell us? It all depends on use case. Indexing is biggest problem, but math itself, or searching is not that big. Whether it's OK for your situation – you have to assess yourself.

UPDATE

Dim from irc asked me to check aggregates too. So I did.

Results for select sum(some_column):

table time more than fastest
test_float4 63.163 ms + 10.28%
test_float8 62.684 ms + 9.44%
test_int2 57.685 ms + 0.71%
test_int4 57.276 ms + 0.00%
test_int8 152.288 ms + 165.88%
test_numeric 118.889 ms + 107.57%
test_numeric10 118.179 ms + 106.33%
test_numeric20 118.100 ms + 106.20%

and select avg(some_column):

table time more than fastest
test_float4 69.261 ms + 12.53%
test_float8 69.909 ms + 13.58%
test_int2 62.018 ms + 0.76%
test_int4 61.551 ms + 0.00%
test_int8 153.370 ms + 149.17%
test_numeric 120.552 ms + 95.86%
test_numeric10 120.645 ms + 96.01%
test_numeric20 120.597 ms + 95.93%

This is pretty interesting. Especially how bad int8 seems to be …

  1. 8 comments

  2. # edo
    Dec 7, 2014

    1. what about disk usage on table+indexes?
    2. may be range test result would be better with explicit type specification like “between 1::float8 and 100::float8”?

  3. # did
    Dec 8, 2014

    hi
    In my understanding search is not slower with float, ‘displaying’ float is slow, eg if you run explain analyse float select is faster than numeric.

  4. Dec 8, 2014

    how about comparing int8 vs char(8) and varchar(8).
    since they are using the same size of memory and disk how about the performance ?

  5. # sftf
    Dec 8, 2014

    “This is pretty interesting. Especially how bad int8 seems to be …”
    Maybe 32-bitness of tested PostgreSQL leads to such results?
    How about the same on 64-bit PostgreSQL?

  6. # marian
    Dec 8, 2014

    What’s postgres version, 64 or 32 bit?

  7. Dec 8, 2014

    ndwijaya:
    Isn’t that enough: http://www.depesz.com/2012/06/07/123-vs-depesz-what-is-faster/

  8. Dec 8, 2014

    @sftf and @marian:

    It was (and is) 64 bit on 64bit platform – 9.5devel on 64bit Linux.

  9. Dec 20, 2014

    Nice to actually see some quantifiable speed differences. I wonder why sum/avg is so slow for int8 and if older versions of postgres give the same result. I also hadn’t expected to see quite the insert penalty for numeric, is that mainly going to be down to varlena overhead/size?

Leave a comment