On 20th of January 2020, Amit Kapila committed patch:
Allow vacuum command to process indexes in parallel. This feature allows the vacuum to leverage multiple CPUs in order to process indexes. This enables us to perform index vacuuming and index cleanup with background workers. This adds a PARALLEL option to VACUUM command where the user can specify the number of workers that can be used to perform the command which is limited by the number of indexes on a table. Specifying zero as a number of workers will disable parallelism. This option can't be used with the FULL option. Each index is processed by at most one vacuum process. Therefore parallel vacuum can be used when the table has at least two indexes. The parallel degree is either specified by the user or determined based on the number of indexes that the table has, and further limited by max_parallel_maintenance_workers. The index can participate in parallel vacuum iff it's size is greater than min_parallel_index_scan_size. Author: Masahiko Sawada and Amit Kapila Mahendra Singh and Sergei Kornilov Discussion: https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
Description is pretty long, so let's see how it really works.
First, I will need some sample table, with several indexes:
=$ CREATE TABLE test ( id INT4 GENERATED ALWAYS AS IDENTITY PRIMARY KEY, some_int INT4, some_timestamp TIMESTAMPTZ, other_int INT4, other_timestamp TIMESTAMPTZ ); CREATE TABLE =$ INSERT INTO test (some_int, some_timestamp, other_int, other_timestamp) SELECT random() * 500000000, '2000-01-01'::DATE + '20 years'::INTERVAL * random(), random() * 500000000, '1970-01-01'::DATE + '20 years'::INTERVAL * random() FROM generate_series(1,100000000) i; INSERT 0 100000000
OK, so we have some rows, which look like:
=$ SELECT * FROM test LIMIT 10; id | some_int | some_timestamp | other_int | other_timestamp ----+-----------+-----------------------------+-----------+----------------------------- 1 | 179275930 | 2010-01-28 23:02:57.0048+01 | 31223069 | 1975-10-21 03:30:13.536+01 2 | 119009254 | 2019-05-01 14:40:19.5168+02 | 390536066 | 1974-02-18 16:12:31.7952+01 3 | 153965899 | 2010-04-26 00:36:46.3968+02 | 109395281 | 1985-10-09 01:30:36.0288+01 4 | 123106154 | 2006-06-28 18:59:21.0624+02 | 399537003 | 1982-12-08 21:13:32.5056+01 5 | 338157258 | 2006-11-04 07:21:34.7328+01 | 487378393 | 1975-02-14 05:59:28.7232+01 6 | 108837322 | 2006-10-10 04:07:35.2704+02 | 53539283 | 1987-05-17 00:59:49.5744+02 7 | 434671405 | 2011-04-09 00:21:43.4304+02 | 374841058 | 1980-05-13 17:12:37.1808+02 8 | 407587896 | 2013-08-02 15:26:41.3376+02 | 180180561 | 1985-01-06 04:22:03.6768+01 9 | 450852732 | 2008-10-27 18:14:00.4992+01 | 81128068 | 1975-06-26 01:55:29.8848+01 10 | 306987401 | 2013-12-17 08:39:19.1232+01 | 28668776 | 1970-08-01 15:43:29.0208+01 (10 ROWS)
Now, let's add some indexes:
=$ CREATE INDEX i1 ON test (some_int); CREATE INDEX =$ CREATE INDEX i2 ON test (some_timestamp); CREATE INDEX =$ CREATE INDEX i3 ON test (other_int); CREATE INDEX =$ CREATE INDEX i4 ON test (other_timestamp); CREATE INDEX =$ CREATE INDEX i5 ON test (some_int, some_timestamp); CREATE INDEX =$ CREATE INDEX i6 ON test (other_int, other_timestamp); CREATE INDEX
So, I have 7 indexes, and the table itself:
=$ SELECT c.relname, c.relkind, pg_size_pretty(pg_relation_size(c.oid)) FROM pg_class c WHERE c.relname = 'test' OR c.oid IN ( SELECT i.indexrelid FROM pg_index i WHERE i.indrelid = 'test'::regclass ); relname | relkind | pg_size_pretty -----------+---------+---------------- test | r | 5744 MB test_pkey | i | 2142 MB i1 | i | 2142 MB i2 | i | 2142 MB i3 | i | 2142 MB i4 | i | 2142 MB i5 | i | 3004 MB i6 | i | 3004 MB (8 ROWS)
With this in place, and disabled autovacuum I ran:
=$ DELETE FROM test WHERE random() < 0.5;
So that indexes (and table) will have some cleanup to be done.
Then I ran vacuum without any parallelization:
=$ vacuum (verbose ON, analyze ON, parallel 0) test;
Afterwards I recreated the whole thing, and ran:
=$ SET max_parallel_maintenance_workers = 2; =$ vacuum (verbose ON, analyze ON) test;
and finally, after another recreate:
=$ SET max_parallel_maintenance_workers = 8; =$ vacuum (verbose ON, analyze ON) test;
Logs showed, for sequential vacuum:
=$ vacuum (verbose ON, analyze ON, parallel 0) test; psql:test1.sql:2: INFO: vacuuming "public.test" ... CPU: USER: 663.45 s, system: 87.05 s, elapsed: 1505.04 s. psql:test1.sql:2: INFO: analyzing "public.test" psql:test1.sql:2: INFO: "test": scanned 30000 OF 735295 pages, containing 2040581 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 50014300 estimated total ROWS VACUUM TIME: 1505238.738 ms (25:05.239)
for vacuum with 2 workers:
=$ SET max_parallel_maintenance_workers = 2; SET =$ vacuum (verbose ON, analyze ON) test; psql:test2.sql:3: INFO: vacuuming "public.test" psql:test2.sql:3: INFO: launched 2 parallel vacuum workers FOR INDEX vacuuming (planned: 2) ... CPU: USER: 119.29 s, system: 43.63 s, elapsed: 694.13 s. psql:test2.sql:3: INFO: analyzing "public.test" psql:test2.sql:3: INFO: "test": scanned 30000 OF 735295 pages, containing 2039828 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 49995844 estimated total ROWS VACUUM TIME: 694336.035 ms (11:34.336)
and when I picked 8 workers:
=$ SET max_parallel_maintenance_workers = 8; SET =$ vacuum (verbose ON, analyze ON) test; psql:test3.sql:3: INFO: vacuuming "public.test" psql:test3.sql:3: INFO: launched 6 parallel vacuum workers FOR INDEX vacuuming (planned: 6) CPU: USER: 134.24 s, system: 51.37 s, elapsed: 776.12 s. psql:test3.sql:3: INFO: analyzing "public.test" psql:test3.sql:3: INFO: "test": scanned 30000 OF 735295 pages, containing 2040985 live ROWS AND 0 dead ROWS; 30000 ROWS IN sample, 50024202 estimated total ROWS VACUUM TIME: 776326.118 ms (12:56.326)
It looks that it would need a bit more tuning to run with more workers, but in my test case – I have only 7 indexes, so there is not going to be big gain.
Also – speed seemed to be best with 2 workers, but this could be related to some side load on the server. In any way – vacuum is much faster when handling indexes in parallel.
Great stuff, thanks a lot to all involved.