Waiting for 9.5 – vacuumdb: enable parallel mode

On 23rd of January, Alvaro Herrera committed patch:

vacuumdb: enable parallel mode
This mode allows vacuumdb to open several server connections to vacuum
or analyze several tables simultaneously.
Author: Dilip Kumar.  Some reworking by Álvaro Herrera
Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund

This is great for multi-table vacuums/analyzes. Previously, when running “vacuumdb some_database" you got all tables vacuumed in some kind of order.

We can see it, by creating some test tables:

for a in $( pwgen -A0 6 20 )
    psql -c "create table test_$a as select generate_series(1, 1000000)::int4 as i"

It did create for me these tables:

$ \d
           List OF relations
 Schema |    Name     | TYPE  | Owner  
 public | test_aeshae | TABLE | depesz
 public | test_aijoox | TABLE | depesz
 public | test_eeyies | TABLE | depesz
 public | test_eicaeb | TABLE | depesz
 public | test_ieveov | TABLE | depesz
 public | test_isoexu | TABLE | depesz
 public | test_jaidaf | TABLE | depesz
 public | test_meijie | TABLE | depesz
 public | test_ohceec | TABLE | depesz
 public | test_okaeph | TABLE | depesz
 public | test_ooghap | TABLE | depesz
 public | test_oojuxo | TABLE | depesz
 public | test_sheeru | TABLE | depesz
 public | test_shofae | TABLE | depesz
 public | test_thaepu | TABLE | depesz
 public | test_uuyohl | TABLE | depesz
 public | test_uyuque | TABLE | depesz
 public | test_wainie | TABLE | depesz
 public | test_yiegah | TABLE | depesz
 public | test_zeecie | TABLE | depesz
(20 ROWS)

If I'll run the vacuumdb now, normally, it will show me:

=$ vacuumdb -e
vacuumdb: vacuuming DATABASE "depesz"

I.e. just “vacuum" was run, without any table names – it will run database-wide vacuum that will work in one backend, and do all tables sequentially.

I could have influenced order by doing something like:

vacuumdb -e -t test_zeecie -t test_yiegah -t test_wainie -t test_uyuque -t test_uuyohl -t test_thaepu -t test_shofae -t test_sheeru -t test_oojuxo -t test_ooghap -t test_okaeph -t test_ohceec -t test_meijie -t test_jaidaf -t test_isoexu -t test_ieveov -t test_eicaeb -t test_eeyies -t test_aijoox -t test_aeshae
vacuumdb: vacuuming database "depesz"
VACUUM test_zeecie;
VACUUM test_yiegah;
VACUUM test_wainie;
VACUUM test_uyuque;
VACUUM test_uuyohl;
VACUUM test_thaepu;
VACUUM test_shofae;
VACUUM test_sheeru;
VACUUM test_oojuxo;
VACUUM test_ooghap;
VACUUM test_okaeph;
VACUUM test_ohceec;
VACUUM test_meijie;
VACUUM test_jaidaf;
VACUUM test_isoexu;
VACUUM test_ieveov;
VACUUM test_eicaeb;
VACUUM test_eeyies;
VACUUM test_aijoox;
VACUUM test_aeshae;

But, all tables were vacuumed in order anyway.

Now, I can add -j option, and it will parallelize processing:

$ vacuumdb -e -j8
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
 WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
 ORDER BY c.relpages DESC;
VACUUM public.test_ohceec;
VACUUM public.test_zeecie;
VACUUM public.test_jaidaf;
VACUUM public.test_aeshae;
VACUUM public.test_ieveov;
VACUUM public.test_yiegah;
VACUUM public.test_shofae;
VACUUM public.test_thaepu;
VACUUM public.test_sheeru;
VACUUM public.test_uuyohl;
VACUUM public.test_eicaeb;
VACUUM public.test_ooghap;
VACUUM public.test_oojuxo;
VACUUM public.test_okaeph;
VACUUM public.test_meijie;
VACUUM public.test_aijoox;
VACUUM public.test_eeyies;
VACUUM public.test_isoexu;
VACUUM public.test_uyuque;
VACUUM public.test_wainie;
VACUUM pg_catalog.pg_proc;
VACUUM pg_catalog.pg_depend;
VACUUM pg_catalog.pg_attribute;
VACUUM pg_catalog.pg_description;
VACUUM pg_catalog.pg_statistic;
VACUUM pg_catalog.pg_operator;
VACUUM pg_catalog.pg_rewrite;
VACUUM pg_catalog.pg_type;
VACUUM pg_catalog.pg_class;
VACUUM information_schema.sql_features;
VACUUM pg_catalog.pg_amop;
VACUUM pg_catalog.pg_amproc;
VACUUM pg_catalog.pg_conversion;
VACUUM pg_catalog.pg_opclass;
VACUUM pg_catalog.pg_index;
VACUUM pg_catalog.pg_aggregate;
VACUUM pg_catalog.pg_collation;
VACUUM pg_catalog.pg_opfamily;
VACUUM pg_catalog.pg_cast;
VACUUM pg_catalog.pg_ts_config_map;
VACUUM pg_catalog.pg_language;
VACUUM pg_catalog.pg_authid;
VACUUM pg_catalog.pg_constraint;
VACUUM pg_catalog.pg_am;
VACUUM pg_catalog.pg_namespace;
VACUUM pg_catalog.pg_database;
VACUUM pg_catalog.pg_tablespace;
VACUUM pg_catalog.pg_pltemplate;
VACUUM pg_catalog.pg_shdepend;
VACUUM pg_catalog.pg_shdescription;
VACUUM pg_catalog.pg_ts_config;
VACUUM pg_catalog.pg_ts_dict;
VACUUM pg_catalog.pg_ts_parser;
VACUUM pg_catalog.pg_ts_template;
VACUUM pg_catalog.pg_extension;
VACUUM pg_catalog.pg_range;
VACUUM information_schema.sql_implementation_info;
VACUUM information_schema.sql_languages;
VACUUM information_schema.sql_parts;
VACUUM information_schema.sql_sizing;
VACUUM information_schema.sql_packages;
VACUUM information_schema.sql_sizing_profiles;
VACUUM pg_catalog.pg_seclabel;
VACUUM pg_catalog.pg_default_acl;
VACUUM pg_catalog.pg_policy;
VACUUM pg_catalog.pg_foreign_table;
VACUUM pg_catalog.pg_foreign_server;
VACUUM pg_catalog.pg_foreign_data_wrapper;
VACUUM pg_catalog.pg_auth_members;
VACUUM pg_catalog.pg_db_role_setting;
VACUUM pg_catalog.pg_event_trigger;
VACUUM pg_catalog.pg_trigger;
VACUUM pg_catalog.pg_largeobject_metadata;
VACUUM pg_catalog.pg_inherits;
VACUUM pg_catalog.pg_attrdef;
VACUUM pg_catalog.pg_enum;
VACUUM pg_catalog.pg_shseclabel;
VACUUM pg_catalog.pg_user_mapping;
VACUUM pg_catalog.pg_largeobject;

As we can see – it initially gets list of tables to vacuum, sorted by relpages – which is basically size of the table.

And then vacuums these in parallel, using many backends. In pg_stat_activity it looks like:

$ SELECT pid, query FROM pg_stat_activity WHERE pid <> pg_backend_pid()
 pid  |           query            
 5981 | VACUUM public.test_ohceec;
 5982 | VACUUM public.test_zeecie;
 5983 | VACUUM public.test_jaidaf;
 5984 | VACUUM public.test_aeshae;
 5985 | VACUUM public.test_ieveov;
 5986 | VACUUM public.test_yiegah;
 5987 | VACUUM public.test_shofae;
 5988 | VACUUM public.test_thaepu;
(8 ROWS)
Nice. Starting WITH largest TABLES FIRST IS great. Thanks guys.