Waiting for PostgreSQL 19 – Introduce the REPACK command

On 10th of March 2026, Álvaro Herrera committed patch:

Introduce the REPACK command
 
REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command.  Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
IT world and even in Postgres itself, so it's good that we can avoid it.
 
We retain those older commands, but de-emphasize them in the
documentation, in favor of REPACK; the difference between VACUUM FULL
and CLUSTER (namely, the fact that tuples are written in a specific
ordering) is neatly absorbed as two different modes of REPACK.
 
This allows us to introduce further functionality in the future that
works regardless of whether an ordering is being applied, such as (and
especially) a concurrent mode.
 
Author: Antonin Houska <ah@cybertec.at>
Reviewed-by: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Robert Treat <rob@xzilla.net>
Reviewed-by: Euler Taveira <euler@eulerto.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Reviewed-by: jian he <jian.universality@gmail.com>
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/202507262156.sb455angijk6@alvherre.pgsql

I'm VERY excited.

Bloat is a thing that happens, and while we have good(-ish) solution, it's not installed by default, so some people are somewhat afraid to install and use it, and instead opt to use vacuum full, which, to put it lightly, I'm not really fond of.

Now, it seems that we will get built-in REPACK.

At the moment it doesn't have any kind of support for being concurrent, so it does behave like VACUUM FULL in terms of locking, but I hope that it will get concurrently mode sooner or later.

Anyway, let's see how it looks, and works.

=$ \h repack
Command:     REPACK
Description: rewrite a table to reclaim disk space
Syntax:
REPACK [ ( option [, ...] ) ] [ table_and_columns [ USING INDEX [ index_name ] ] ]
REPACK [ ( option [, ...] ) ] USING INDEX
 
where option can be one of:
 
    VERBOSE [ boolean ]
    ANALYZE [ boolean ]
 
and table_and_columns is:
 
    table_name [ ( column_name [, ...] ) ]
 
URL: https://www.postgresql.org/docs/devel/sql-repack.html

Seems pretty simple.

One thing not visible here is that we get new progress view: pg_stat_progress_repack.

So, let's see how it works.

Made myself test table by copying all plans from explain.depesz.com, and added some indexes:

=$ create table to_repack as select * from plans;
SELECT 1898650
 
=$ alter table to_repack add primary key (id);
ALTER TABLE
 
=$ create index on_entered_on on to_repack (entered_on);
CREATE INDEX
 
=$ create index who_added on to_repack (added_by);
CREATE INDEX
 
=$ select pg_relation_size(oid), pg_table_size(oid), pg_total_relation_size(oid) from pg_class  where relname = 'to_repack';
 pg_relation_size │ pg_table_size │ pg_total_relation_size
──────────────────┼───────────────┼────────────────────────
       1516544000 │    8327667712 │             8426242048
(1 row)
 
=$ select relname, pg_relation_size(oid) from pg_class  where relname = any('{to_repack_pkey,on_entered_on,who_added}');
    relname     │ pg_relation_size
────────────────┼──────────────────
 to_repack_pkey │         42672128
 on_entered_on  │         42672128
 who_added      │         13230080
(3 rows)

So, we have ~ 8.5GB table, ou of which ~ 8.3 is table data, and the rest as three indexes.

Now, let's remove significant portion of the table:

=$ delete from to_repack where random() < 0.8;

That's 80% of rows gone.

And now, let's see the repack. I'll try to do repack using index:

=$ repack (verbose, analyze) to_repack using index on_entered_on;
INFO:  repacking "public.to_repack" using sequential scan and sort
INFO:  "public.to_repack": found 1271577 removable, 379276 nonremovable row versions in 185125 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 4.78 s, system: 3.75 s, elapsed: 10.54 s.
INFO:  analyzing "public.to_repack"
INFO:  "to_repack": scanned 30000 of 36910 pages, containing 308722 live rows and 0 dead rows; 30000 rows in sample, 379831 estimated total rows
INFO:  finished analyzing table "depesz_explain.public.to_repack"
I/O timings: read: 0.228 ms, write: 9.901 ms
avg read rate: 1070.669 MB/s, avg write rate: 0.571 MB/s
buffer usage: 209 hits, 30013 reads, 16 dirtied
WAL usage: 140 records, 8 full page images, 116542 bytes, 45928 full page image bytes, 0 buffers full
system usage: CPU: user: 0.18 s, system: 0.02 s, elapsed: 0.21 s
REPACK
Time: 13142.040 ms (00:13.142)

Sizes afterwards:

=$ select pg_relation_size(oid), pg_table_size(oid), pg_total_relation_size(oid) from pg_class  where relname = 'to_repack';
 pg_relation_size │ pg_table_size │ pg_total_relation_size
──────────────────┼───────────────┼────────────────────────
        302366720 │    1639038976 │             1658806272
(1 row)
 
=$ select relname, pg_relation_size(oid) from pg_class  where relname = any('{to_repack_pkey,on_entered_on,who_added}');
    relname     │ pg_relation_size
────────────────┼──────────────────
 to_repack_pkey │          8544256
 on_entered_on  │          8544256
 who_added      │          2678784
(3 rows)

Clearly significantly less.

Interestingly, doing repack with index marked it as cluster index:

=$ \d to_repack
                           Table "public.to_repack"
      Column      │           Type           │ Collation │ Nullable │ Default
──────────────────┼──────────────────────────┼───────────┼──────────┼─────────
 id               │ text                     │           │ not null │
 plan             │ text                     │           │          │
 entered_on       │ timestamp with time zone │           │          │
 is_public        │ boolean                  │           │          │
 is_anonymized    │ boolean                  │           │          │
 title            │ text                     │           │          │
 delete_key       │ text                     │           │          │
 is_deleted       │ boolean                  │           │          │
 added_by         │ text                     │           │          │
 optimization_for │ text                     │           │          │
 query            │ text                     │           │          │
 comments         │ text                     │           │          │
 entered_from     │ inet                     │           │          │
Indexes:
    "to_repack_pkey" PRIMARY KEY, btree (id)
    "on_entered_on" btree (entered_on) CLUSTER
    "who_added" btree (added_by)

which will become important in a moment.

So, we reclaimed bunch of space. And reordered the table. In the mean time I was selecting data from pg_stat_progress_repack, every 500ms, and it looked like this:

  pid  │ datid │    datname     │  relid  │ command │    phase          │ repack_index_relid │ heap_tuples_scanned │ heap_tuples_written │ heap_blks_total │ heap_blks_scanned │ index_rebuild_count
───────┼───────┼────────────────┼─────────┼─────────┼───────────────────┼────────────────────┼─────────────────────┼─────────────────────┼─────────────────┼───────────────────┼─────────────────────
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ initializing      │                  0 │                   0 │                   0 │               0 │                 0 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ initializing      │                  0 │                   0 │                   0 │               0 │                 0 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ seq scanning heap │                  0 │              109869 │                   0 │          185125 │             53443 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ seq scanning heap │                  0 │              195323 │                   0 │          185125 │             95250 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ seq scanning heap │                  0 │              261789 │                   0 │          185125 │            127634 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ seq scanning heap │                  0 │              348370 │                   0 │          185125 │            170018 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │               13439 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │               59872 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              100397 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              138061 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              166251 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              197003 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              217909 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              238271 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              255559 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              278474 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              296842 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              318301 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              338170 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              356051 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ writing new heap  │                  0 │              379276 │              373255 │          185125 │            185125 │                   0
 50721 │ 16386 │ depesz_explain │ 1455363 │ REPACK  │ rebuilding index  │                  0 │              379276 │              379276 │          185125 │            185125 │                   0

Each line above was single row in the view, I just shown them as single table to make it easier to see what was happening.

So, at the moment, we have following possible ways to run it:

REPACK USING INDEX; – this will run repack on all tables that have index marked as cluster index – ordering data by this index.
REPACK to_repack – will repack the table, basically equivalent to vacuum full
REPACK to_repack USING INDEX – will repack the table using index marked as cluster index
REPACK to_repack USING index to_repack_pkey – will repack the table using primary key index, and will mark this index as cluster index

For each of these, you can provide list of options, which is currently limited to: ANALYZE and VERBOSE. Verbose will print out some run information, and ANALYZE will run ANALYZE on the data after repacking. Which is generally what you want, as data ordering might change, and this could influence planner.

If you enable ANALYZE, you can also use these syntaxes:

REPACK (analyze) to_repack (title, optimization_for) – will repack the table, and run analyze but only for the two mentioned columns.
REPACK (analyze) to_repack (title, optimization_for) USING INDEX on_entered_on – will repack the table in order of entered_on column, mark on_entered_on index as cluster index, and run analyze for the two mentioned columns.

All in all, at the moment, it's not earth-shattering. But I see it as first, necessary, step to get repack concurrently which will definitely be db-world-changing 🙂

Amazing work, thanks a lot to everyone involved.

=$
|

Waiting for PostgreSQL 19 – Introduce the REPACK command

Leave a Reply