Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?

Recently(ish) I had a conversation on one of PostgreSQL support chats (IRC, Slack, or Discord) about efficient storage of JSON data, which compression to use, which datatype.

Unrelated to this, some people (at least two over the last year or so) said that they aren't sure if PostgreSQL doesn't optimize storage between columns, for example, storing attribute names once per column, and not once per value.

Decided to investigate…

First, let's look at compression. By default my PostgreSQL is using:

=$ show default_toast_compression ;
 default_toast_compression
───────────────────────────
 pglz
(1 row)

Which is the older compression format. I can change it, since PostgreSQL 14 to use LZ4.

Simple table for testing:

=$ create table compression_test (test_name text, text_pglz text, text_lz4 text);
=$ alter table compression_test alter column text_pglz set compression pglz;
=$ alter table compression_test alter column text_lz4 set compression lz4;
 
=$ \d+ compression_test
                                     Table "public.compression_test"
  Column   | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-----------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 test_name | text |           |          |         | extended |             |              | 
 text_pglz | text |           |          |         | extended | pglz        |              | 
 text_lz4  | text |           |          |         | extended | lz4         |              | 
Access method: heap

In there I loaded:

four random strings, each consisting of characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, with lengths of: 5, 505, 1005, and 1505 characters.
ten documents from PostgreSQL documentation, picked so that they will differ in size by at least 400 characters, starting with legalnotice.html (2065 characters), and ending with pgfreespacemap.html (6779 characters).

Let's see how much disk space is used by each of these:

=$ SELECT
    test_name,
    length( text_pglz ) AS text_length,
    pg_column_compression( text_pglz ) AS pglz_comp,
    pg_column_size( text_pglz ) AS pglz_size,
    format(
        '%7s %%',
        round( ( 100.0 * pg_column_size( text_pglz ) ) / length( text_pglz ), 2 )
    ) AS pglz_comp,
    pg_column_compression( text_lz4 ) AS lz4_comp,
    pg_column_size( text_lz4 ) AS pglz_size,
    format(
        '%6s %%',
        round( ( 100.0 * pg_column_size( text_lz4 ) ) / length( text_lz4 ), 2 )
    ) AS lz4_comp
FROM
    compression_test
ORDER BY
    text_length;
            test_name            | text_length | pglz_comp | pglz_size | pglz_comp | lz4_comp | pglz_size | lz4_comp 
---------------------------------+-------------+-----------+-----------+-----------+----------+-----------+----------
 random:5                        |           5 | [null]    |         6 |  120.00 % | [null]   |         6 | 120.00 %
 random:505                      |         505 | [null]    |       509 |  100.79 % | [null]   |       509 | 100.79 %
 random:1005                     |        1005 | [null]    |      1005 |  100.00 % | [null]   |      1009 | 100.40 %
 random:1505                     |        1505 | [null]    |      1505 |  100.00 % | [null]   |      1509 | 100.27 %
 legalnotice.html                |        2065 | [null]    |      2071 |  100.29 % | lz4      |      1630 |  78.93 %
 release-prior.html              |        2655 | pglz      |      1214 |   45.73 % | lz4      |      1302 |  49.04 %
 sql-dropuser.html               |        3188 | pglz      |      1462 |   45.86 % | lz4      |      1624 |  50.94 %
 pltcl-procnames.html            |        3670 | pglz      |      1811 |   49.35 % | lz4      |      1970 |  53.68 %
 spi-spi-getvalue.html           |        4216 | pglz      |      1771 |   42.01 % | lz4      |      2020 |  47.91 %
 catalog-pg-db-role-setting.html |        4715 | pglz      |      1811 |   38.41 % | lz4      |      2088 |  44.28 %
 supported-platforms.html        |        5235 | pglz      |      2584 |   49.36 % | lz4      |      2839 |  54.23 %
 datatype-money.html             |        5749 | pglz      |      2814 |   48.95 % | lz4      |      3066 |  53.33 %
 sql-dropsubscription.html       |        6289 | pglz      |      2838 |   45.13 % | lz4      |      3056 |  48.59 %
 pgfreespacemap.html             |        6779 | pglz      |      3134 |   46.23 % | lz4      |      3248 |  47.91 %
(14 rows)

Couple of interesting notes:

For small texts, compression doesn't kick in. This is because of default storage type, which doesn't compress small values.
Size disrepancy for pglz column, when pglz is not actually used (pglz_comp is null) is simply overhead of storing length of string. Which is 1 for up to 127 characters, and then switched to 4 bytes.
These is interesting disrepancy for legalnotice.html – it's not compressed, but size difference is 6 bytes (2071 – 2065). Why? Not sure, to be honest. I thought that it's because of switching to external storage, but this has happened already for random:1005. It's not a big difference, so I can say that it doesn't really matter.

OK. So we see compression details. Specifically, I see that pglz compresses a bit better, but I know that lz4 is simply faster.

How much faster? Let's see. Loaded 16MB of text (concatenated all of html files in PostgreSQL docs) to my test table. Then ran:

=$ select length(text_pglz) from compression_test ;
=$ select length(text_lz4) from compression_test ;

with \timing, multiple times, and got best time. Result:

pglz: 37.705ms ( +47% )
lz4: 25.560ms ( -32% )

OK. Compression is covered. Now, how about storing data. First thing to check will be trivial: does PostgreSQL optimize away simple things like key names? Let's make two simple tables:

=$ create table json_key_optimization_test (
    test_json json
);
=$ create table jsonb_key_optimization_test (
    test_jsonb jsonb
);

Now, I need to generate some random, non-trivially sized json objects. Will use this approach:

=$ CREATE temp TABLE sample_keys as
SELECT
    get_random_string(2) as k1,
    get_random_string(2) as k2,
    get_random_string(2) as k3,
    get_random_string(2) as k4,
    get_random_string(2) as k5;
 
=$ CREATE temp TABLE sample_jsons as
SELECT
    json_build_object(k1, k1, k2, k2, k3, k3, k4, k4, k5, k5) AS json_val,
    jsonb_build_object(k1, k1, k2, k2, k3, k3, k4, k4, k5, k5) AS jsonb_val
FROM
    sample_keys;
SELECT 1
 
=$ select * from sample_jsons \gx
-[ RECORD 1 ]----------------------------------------------------------------
json_val  | {"Mn" : "Mn", "ff" : "ff", "fh" : "fh", "Tw" : "Tw", "k9" : "k9"}
jsonb_val | {"Mn": "Mn", "Tw": "Tw", "ff": "ff", "fh": "fh", "k9": "k9"}

Will just make the length of string, let's say 500 characters. That should be enough.

Let's see table sizes after inserting single such row:

=$ SELECT
    pg_table_size('json_key_optimization_test'::regclass) as json_table_size,
    pg_table_size('jsonb_key_optimization_test'::regclass) as jsonb_table_size;
 json_table_size | jsonb_table_size 
-----------------+------------------
          73728  │            81920
(1 row)

Tested also sizes of generated values:

JSON: length(test_json::text) is 5045 bytes, and pg_column_size is 2878
JSONB: length(test_jsonb::text) is 5040 bytes, and pg_column_size is 2865

Which shows that jsonb seems to be marginally smaller for such extreme case.

Great. Now, for the fun part. Will insert 19 more rows, gathering sizes after each insert. Results (all sizes in bytes):

rows	JSON		JSONB
rows	size	per row	size	per row
1	32,768	32,768	32,768	32,768
2	32,768	16,384	16,384	32,768
3	65,536	21,845	21,845	65,536
4	65,536	16,384	16,384	65,536
5	65,536	13,107	13,107	65,536
6	73,728	12,288	12,288	73,728
7	73,728	10,532	10,532	73,728
8	73,728	9,216	9,216	73,728
9	81,920	9,102	9,102	81,920
10	81,920	8,192	8,192	81,920
11	90,112	8,192	8,192	90,112
12	90,112	7,509	7,509	90,112
13	90,112	6,931	6,931	90,112
14	98,304	7,021	7,021	98,304
15	98,304	6,553	6,553	98,304
16	98,304	6,144	6,144	98,304
17	106,496	6,264	6,264	106,496
18	106,496	5,916	5,916	106,496
19	114,688	6,036	6,036	114,688
20	114,688	5,734	5,734	114,688

So, there size increase is not linear, which seems to point to optimization, but, The average value per row is still significantly larger than size of single value. So I assume that there is no cross-row optimizations, and the nonlinear size scaling simply happens because of uneven filling of table/toast pages.

Now, let's look as space usage of JSON vs. JSONB. In example above we've seen that jsonb values are slightly smaller, both on disk and as length(val::text). Is it always the case?

Let's start with very simple json value:

=$ create table size_comparison (json_val json, jsonb_val jsonb);
 
=$ with s as materialized ( select 1::int4 as v )
insert into size_comparison select to_json(v), to_jsonb(v) from s;
 
=$ SELECT
    *,
    pg_column_size( json_val ) AS json_size,
    pg_column_size( jsonb_val ) AS jsonb_size
FROM
    size_comparison;
 json_val | jsonb_val | json_size | jsonb_size 
----------+-----------+-----------+------------
 1        | 1         |         2 |         17
(1 row)

As you can see in case of single-digit integer, jsonb uses MUCH more space. How about different values?

Value	JSON size (bytes)	JSONB size (bytes)
`["Joe", "Elon", "Jeff", "Mia", "Jenna", "Louis"]`	49	53
`[12, 56, 64, 123, 2, 489]`	26	77
`{"isActive": true, "createdAt": "2022-05-15T08:30:00Z", "lastLogin": "2025-01-10T12:45:00Z"}`	93	95
~ 1.5kB json	828	976
~ 4.5kB json	1666	2066

So, it looks that JSON values use slightly less space. Personally, I think it's a cost worth paying, given enormous functionality increase.

Finally, one last thing I want to touch. Speed of parsing values. Recently someone somewhere commented that parsing JSON values was huge part of inserting data to tables. In his case, if I recall correctly he was inserting json values in size of ~ 50kB.

So, let's try. Downloaded huge JSON, and extracted 50k out of it. Then made simple file that contained the same json, as single line, repeated 4100 times, so that full dataset was ~ 200MB.

Then I made six separate tables:

=$ create table copy_text_pglz  (v text  compression pglz );
=$ create table copy_json_pglz  (v json  compression pglz );
=$ create table copy_jsonb_pglz (v jsonb compression pglz );
=$ create table copy_text_lz4   (v text  compression lz4 );
=$ create table copy_json_lz4   (v json  compression lz4 );
=$ create table copy_jsonb_lz4  (v jsonb compression lz4 );

And then ran, for each table:

truncate copy_text_pglz;
checkpoint;
\timing on
\copy copy_text_pglz from test-200mb.json
checkpoint;
\timing off
select pg_table_size('copy_text_pglz'::regclass);

Then I summed time used by \copy and the immediate checkpoint afterwards.

Each test was repeated 6 times, and best time picked. Results:

Datatype:	Compression:	Time to load (ms):	Table size (MB):
text	lz4	779.105	54
json	lz4	1,155.428	54
jsonb	lz4	2,186.738	65
text	pglz	2,498.587	65
json	pglz	2,917.188	67
jsonb	pglz	4,418.037	73

It clearly shows, that while jsonb has some non-trivial overhead over json, and this, in turn over plain text, overhead of pglz is much greater.

So, these are writes.

How about reads? First, let's just assume I want to export the whole thing, using:

=$ select v from some_table;

Tested it using explain (analyze, serialize binary), and compared execution time (which includes serialization). Results are somewhat surprising:

Datatype:	Compression:	Time to select (ms):
text	lz4	74.374
json	lz4	90.509
text	pglz	120.669
json	pglz	131.659
jsonb	lz4	609.387
jsonb	pglz	656.079

So, to store json data in db, and exporting it as one big data dump, and assuming you trust your inputs, just use text datatype with lz4 compression.

This doesn't looks like sensible use for json in database, though. Normally people would want to select rows based on some parts of the jsons, or extract information, or aggregate, or whatever. And for this – jsonb has no real competition.

To give you some perspective, timed query that returns number of elements in the array that is stored.

Queries looked like this:

=$ select jsonb_array_length(v) from copy_jsonb_lz4;
=$ select jsonb_array_length(v) from copy_jsonb_pglz;
=$ select jsonb_array_length(v::jsonb) from copy_text_lz4;
=$ select jsonb_array_length(v::jsonb) from copy_text_pglz;
=$ select json_array_length(v) from copy_json_lz4;
=$ select json_array_length(v) from copy_json_pglz;
=$ select json_array_length(v::json) from copy_text_lz4;
=$ select json_array_length(v::json) from copy_text_pglz;

For text, I had to first cast to json, or jsonb, so there are two tests for each text-based table.

Results aren't surprising:

Datatype:	Compression:	Time to select (ms):
jsonb	lz4	57.420
jsonb	pglz	112.459
json	lz4	396.273
json	pglz	432.536
text::json	lz4	730.255
text::json	pglz	785.524
text::jsonb	lz4	1,600.397
text::jsonb	pglz	1,640.635

Uff, that took a while to research, test, and write up. Hope you'll find it useful in some way 🙂

One thought on “Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?”

Indu says:

2025-12-05 at 18:44

Thank you for the detailed experiment. Helps a lot to understand the JSON compression both speed to insert and select. Very Interesting.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

=$
|

Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?

One thought on “Using JSON: json vs. jsonb, pglz vs. lz4, key optimization, parsing speed?”

Leave a Reply