Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

On 2nd of April 2019, Alvaro Herrera committed patch:

Report progress of CREATE INDEX operations
 
 
This uses the progress reporting infrastructure added by ,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
 
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
 
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)
 
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql

Well, explanation is rather irrelevant, so let's just make some test data, and see how it works:

=$ CREATE TABLE test (
    id serial PRIMARY KEY,
    payload text
);
CREATE TABLE
 
=$ copy test (payload) FROM program 'ruby -e "10000000.times { puts (0...50).map { (97 + rand(26)).chr }.join}"'
COPY 10000000

Now, in one session I'll run:

=$ CREATE INDEX q ON test (payload);

and in another:

=$ SELECT now()::TIME(0), a.query, p.phase, p.blocks_total, p.blocks_done, p.tuples_total, p.tuples_done FROM pg_stat_progress_create_index p JOIN pg_stat_activity a ON p.pid = a.pid \watch 2

Progress data visible:

21:29:10 │ create index q on test (payload); │ building index: scanning table         │       103093 │       11808 │            0 │           0
21:29:12 │ create index q on test (payload); │ building index: scanning table         │       103093 │       35696 │            0 │           0
21:29:14 │ create index q on test (payload); │ building index: scanning table         │       103093 │       59613 │            0 │           0
21:29:16 │ create index q on test (payload); │ building index: scanning table         │       103093 │       83361 │            0 │           0
21:29:18 │ create index q on test (payload); │ building index: scanning table         │       103093 │      103093 │            0 │           0
21:29:20 │ create index q on test (payload); │ building index: scanning table         │       103093 │      103093 │            0 │           0
21:29:22 │ create index q on test (payload); │ building index: loading tuples in tree │            0 │           0 │     10000000 │     4303927
21:29:24 │ create index q on test (payload); │ building index: loading tuples in tree │            0 │           0 │     10000000 │    10000000

Nice. There are more columns in the table, but were irrelevant for my example:

=$ SELECT * FROM pg_stat_progress_create_index;
─[ RECORD 1 ]──────┬───────────────────────────────
pid                │ 14296
datid              │ 17564
datname            │ depesz
relid              │ 17681
index_relid        │ 0
phase              │ building INDEX: scanning TABLE
lockers_total      │ 0
lockers_done       │ 0
current_locker_pid │ 0
blocks_total       │ 103093
blocks_done        │ 23989
tuples_total       │ 0
tuples_done        │ 0
partitions_total   │ 0
partitions_done    │ 0

These will be useful in more complicated examples, like with concurrent connections, or partitions.

This is AMAZING. Thanks a lot to all involved.

2 thoughts on “Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations”

  1. Thanks for highlighting this feature.

    Most of the other columns are used by CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY.

    In particular, the “lockers_done”, total, and current PID, I believe will be very useful for people with high-concurrency servers who sometimes wonder what is a C.I.C. waiting on, since it has to wait for concurrent transactions at various points in the process, and it’s not easy to figure out what, how many are done, how many to wait are still running, etc.

    Also, we now have REINDEX CONCURRENTLY in pg12, which will wait at even more points during the process. REINDEX will also appear in the CREATE INDEX progress view, thanks to a later commit by Peter Eisentraut.

  2. Thanks for demonstrating the advantage of the path with example!

Comments are closed.