Bloat removal without table swapping

Some time ago I wrote about my favorite method of bloat removal. Around one year earlier, I wrote about another idea for bloat removal. This older idea was great – it didn't involve usage of triggers, overhead on all writes, table swapping. It had just one small, tiny, minuscule little issue. It was unbearably slow.

My idea was explored by Nathan Thom, but his blogpost disappeared.

Recently, Sergey Konoplev wrote to me about his tool, that he wrote using the same idea – updating rows to move them to other pages. So I decided that I have to check it.

Obviously first I needed some test data. To avoid autovacuum cleaning bloat, I disabled it, and then created the table:

$ CREATE TABLE test (
    id int4,
    int_1 int4,
    int_2 int4,
    int_3 int4,
    ts_1 timestamptz,
    ts_2 timestamptz,
    ts_3 timestamptz,
    text_1 text,
    text_2 text,
    text_3 text
);
CREATE TABLE

Of course empty table is of no use. So let's add some data:

$ INSERT INTO test
SELECT
    i,
    CAST(random() * 10000000 AS int4),
    CAST(random()*10000000 AS int4),
    CAST(random()*10000000 AS int4),
    now() - '2 years'::INTERVAL * random(),
    now() - '2 years'::INTERVAL * random(),
    now() - '2 years'::INTERVAL * random(),
    repeat('text_1 ', CAST(10 + random() * 100 AS int4)),
    repeat('text_2 ', CAST(10 + random() * 100 AS int4)),
    repeat('text_2 ', CAST(10 + random() * 100 AS int4))
FROM generate_series(1, 1000000) i;

When data was loaded, I added some indexes to emulate real world:

$ ALTER TABLE test ADD PRIMARY KEY (id);
ALTER TABLE
 
$ CREATE UNIQUE INDEX i1 ON test (int_1, int_2);
CREATE INDEX
 
$ CREATE INDEX i2 ON test (int_2);
CREATE INDEX
 
$ CREATE INDEX i3 ON test (int_3, ts_1);
CREATE INDEX
 
$ CREATE INDEX i4 ON test (ts_2);
CREATE INDEX
 
$ CREATE INDEX i5 ON test (ts_3);
CREATE INDEX
 
$ CREATE INDEX i6 ON test (text_1);
CREATE INDEX

Afterwards the table looked like this:

$ \d test
              TABLE "public.test"
 COLUMN |           TYPE           | Modifiers 
--------+--------------------------+-----------
 id     | INTEGER                  | NOT NULL
 int_1  | INTEGER                  | 
 int_2  | INTEGER                  | 
 int_3  | INTEGER                  | 
 ts_1   | TIMESTAMP WITH TIME zone | 
 ts_2   | TIMESTAMP WITH TIME zone | 
 ts_3   | TIMESTAMP WITH TIME zone | 
 text_1 | text                     | 
 text_2 | text                     | 
 text_3 | text                     | 
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)
    "i1" UNIQUE, btree (int_1, int_2)
    "i2" btree (int_2)
    "i3" btree (int_3, ts_1)
    "i4" btree (ts_2)
    "i5" btree (ts_3)
    "i6" btree (text_1)

Sizes of table and indexes:

  relname  | relkind |  SIZE
-----------+---------+---------
 test      | r       | 1325 MB
 i6        | i       | 241 MB
 i3        | i       | 30 MB
 i1        | i       | 21 MB
 i2        | i       | 21 MB
 i4        | i       | 21 MB
 i5        | i       | 21 MB
 test_pkey | i       | 21 MB
(8 ROWS)

Now, let's create some bloat. For example – let's remove 95% of the table:

$ DELETE FROM test WHERE random() < 0.95;
DELETE 950335

Now, two (not so quick) vacuums, and size re-check:

  relname  | relkind |  SIZE
-----------+---------+---------
 test      | r       | 1325 MB
 i6        | i       | 242 MB
 i3        | i       | 30 MB
 i1        | i       | 21 MB
 i2        | i       | 21 MB
 i4        | i       | 21 MB
 i5        | i       | 21 MB
 test_pkey | i       | 21 MB
(8 ROWS)

Of course the sizes didn't really change. So let's move to test of the pgcompact tool.

=$ git clone https://github.com/grayhemp/pgtoolkit.git
Cloning INTO 'pgtoolkit'...
remote: Counting objects: 1925, done.
remote: Compressing objects: 100% (277/277), done.
remote: Total 1925 (delta 1220), reused 1915 (delta 1210)
Receiving objects: 100% (1925/1925), 306.22 KiB | 327 KiB/s, done.
Resolving deltas: 100% (1220/1220), done.

Since README suggests it, I'll install pgstattuple:

$ CREATE extension pgstattuple;
CREATE EXTENSION

Now, I can see how the script works:

=$ export PERL5LIB=/home/depesz/pgtoolkit/lib
 
=$ pgtoolkit/bin/pgcompact --help
Name:
    pgcompact - PostgreSQL bloat reducing tool.
 
Usage:
    pgcompact [OPTION...]
 
    General options:
        [-?mV] [(-q | -v LEVEL)]
 
    Connection options:
        [-h HOST] [-p PORT] [-U USER] [-W PASSWD] [-P PATH]
 
    Targeting options:
        (-a | -d DBNAME...) [-n SCHEMA...] [-t TABLE...] [-D DBNAME...] [-N
        SCHEMA...] [-T TABLE...]
 
    Behavioural options:
        [-IRLSCrsfu] [-c PAGES] [-e SECONDS] [-E RATIO] [-o COUNT] [-x
        PAGES] [-y RATIO] [-z SECONDS]

OK. Let's skip the details, and move forward to the usage:

=$ time pgtoolkit/bin/pgcompact -v info -d depesz -t test -p 5930 2>&1 | tee pgcompact.output
 
Tue Oct 15 15:59:43 2013 INFO Database connection method: psql.
Tue Oct 15 15:59:43 2013 depesz INFO Created environment.
Tue Oct 15 15:59:43 2013 depesz INFO Statictics calculation method: pgstattuple.
Tue Oct 15 15:59:43 2013 depesz, public.test INFO Vacuum initial: 169614 pages left, duration 0.359 seconds.
Tue Oct 15 15:59:44 2013 depesz, public.test INFO Bloat statistics with pgstattuple: duration 0.384 seconds.
Tue Oct 15 15:59:44 2013 depesz, public.test NOTICE Statistics: 169614 pages (218154 pages including toasts and indexes), approximately 94.64% (160521 pages) can be compacted reducing the size by 1254 MB.
Tue Oct 15 15:59:44 2013 depesz, public.test INFO Update by column: id.
Tue Oct 15 15:59:44 2013 depesz, public.test INFO Set pages/round: 10.
Tue Oct 15 15:59:44 2013 depesz, public.test INFO Set pages/vacuum: 3393.
Tue Oct 15 15:59:59 2013 depesz, public.test INFO Cleaning in average: 806.4 pages/second (0.012 seconds per 10 pages).
Tue Oct 15 16:00:00 2013 depesz, public.test INFO Vacuum routine: 166209 pages left, duration 0.680 seconds.
...
Tue Oct 15 16:12:16 2013 depesz, public.test INFO Set pages/vacuum: 556.
Tue Oct 15 16:12:16 2013 depesz, public.test INFO Set pages/round: 9.
Tue Oct 15 16:12:17 2013 depesz, public.test INFO Vacuum final: can not clean 171 pages, 8893 pages left, duration 0.185 seconds.
Tue Oct 15 16:12:43 2013 depesz, public.test INFO Analyze final: duration 25.963 second.
Tue Oct 15 16:12:43 2013 depesz, public.test INFO Bloat statistics with pgstattuple: duration 0.029 seconds.
Tue Oct 15 16:12:43 2013 depesz, public.test NOTICE Processing complete.
Tue Oct 15 16:12:43 2013 depesz, public.test NOTICE Processing results: 8893 pages left (57392 pages including toasts and indexes), size reduced by 1256 MB (1256 MB including toasts and indexes) in total.
Tue Oct 15 16:12:43 2013 depesz NOTICE Processing complete.
Tue Oct 15 16:12:43 2013 depesz NOTICE Processing results: size reduced by 1256 MB (1256 MB including toasts and indexes) in total.
Tue Oct 15 16:12:43 2013 NOTICE Processing complete: 0 retries from 10.
Tue Oct 15 16:12:43 2013 NOTICE Processing results: size reduced by 1256 MB (1256 MB including toasts and indexes) in total, 1256 MB (1256 MB) depesz.
Tue Oct 15 16:12:43 2013 depesz INFO Dropped environment.
 
real    13m0.881s
user    0m11.860s
sys     0m2.188s

Sizes afterwards:

$ SELECT relname, relkind, pg_size_pretty(pg_table_size(oid)) AS SIZE FROM pg_class WHERE relname ~ '^(i[0-9]|test|test_pkey)$' ORDER BY pg_relation_size(oid) DESC;
  relname  | relkind |  SIZE  
-----------+---------+--------
 i6        | i       | 242 MB
 test      | r       | 70 MB
 i3        | i       | 30 MB
 i1        | i       | 21 MB
 i2        | i       | 21 MB
 i4        | i       | 21 MB
 i5        | i       | 21 MB
 test_pkey | i       | 21 MB
(8 ROWS)

All in all – it works pretty good. The 800 pages per second means 6.5MB/s. Maybe not the fastest thing on earth, but faster than I expected.

One thing to keep in mind – it works by doing a lot of updates in the table. So, if your table has “ON UPDATE" triggers, you have to put some caution into using the tool (and check if your triggers shouldn't be fixed, not to do anything if the update doesn't change any data in row).

The tool has some rough edges (hardcoded default port for example), but generally – it surprised me how well it behaved. Great work Sergey, thanks a lot.

11 thoughts on “Bloat removal without table swapping”

  1. I was thinking of you when I read the release of this tool. I tried it as well and it worked great.

    Is stopping autovacuum mandatory?

  2. Neat idea in general.

    However, I’m somewhat dubious about how well it would work on an actual busy production database. Have you done any tests with a concurrent workload and autovacuum on?

  3. @Kustodian

    You do not need to stop anything at all to use pgcompact.

    @Josh Berkus

    I successfully use pgcompact on about 40 production servers for more then 1 year. Some of the servers are with 10k+ queries per seconds in avg. The tool adjusts its pace to the current load of the server automatically to not affect user queries. The main idea of the tool was to put it to crontab and forget about bloat.

    @Depesz

    I would like to add a little bit to the article:

    1. If you specify ‘–reindex’ it will reindex bloated indexes as well, including PKs and UKs.
    2. You don’t need to specify a table name, just ‘-d dbname’ (one or more time) or ‘-a’ for all databases the cluster, the tool will automatically find all the tables that have bloat and compact them.
    3. You can use ‘./fatpack/pgcompact’ instead of ‘./bin/pgcompact’, because the former one has all the dependencies packed in, in other words it fully autonomous and you do not need to specify PERL5LIB for it.
    4. You do not need to be wary about ‘ON UPDATE’ triggers because the tool sets ‘session_replication_role’ to ‘replica’ so user defined triggers will not work.
    5. You can ‘pgcompact –man’ to see the full manual with all the parameters and notes.

  4. @Depesz, @Josh

    And another thing I would like to highlight. You can decrease the ‘–delay-ratio’ parameter below and get it several times faster.

    -E RATIO
    –delay-ratio RATIO
    A dynamic part of the delay between rounds is calculated as previous-round-time * delay-ratio. By default 2.

    The thing is that it intentionally waits between rounds to lower the load it produces. So if you put ‘–delay-ratio 1’ it will be around 9 minutes instead of 13.

    You could also increase the parameter below to get speed benefits.

    -c PAGES
    –max-pages-per-round PAGES
    An upper threshold of pages to process per round. By default it is 10.

  5. what about work with skytools 3.1.x (master, slave).
    was it tested?

  6. Was running this command with the –reindex option against a test database and it failed with the following error:

    “ERROR: cannot drop constraint on table because other objects depend on it”

    I think this is a pretty awesome tool to help reduce bloat I cannot just let it churn away on the whole database as a result without tables/schemas that have these type of constraints.

    I presume there is no way around this issue?

  7. @Dave:

    I can’t really tell what’s happening, because I don’t know the table schema. Truth be told, I never used it with –reindex, I generally use it only for whole table – table and indexes.

  8. Will this tool work on bloated pg_largeobject table? and with ~90% of disc size is occupied? without any issue.

    i.e. Situation when
    Total storage : 4TB
    pg_largeobject actual size : 1.5TB
    Bloated size : 3.7TB

    Or any other effective mechanism without adding storage.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.