Waiting for 9.5 – vacuumdb: enable parallel mode

On 23rd of January, Alvaro Herrera committed patch:

vacuumdb: enable parallel mode
 
This mode allows vacuumdb to open several server connections to vacuum
or analyze several tables simultaneously.
 
Author: Dilip Kumar.  Some reworking by Álvaro Herrera
Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund

This is great for multi-table vacuums/analyzes. Previously, when running “vacuumdb some_database" you got all tables vacuumed in some kind of order.

We can see it, by creating some test tables:

for a in $( pwgen -A0 6 20 )
do
    psql -c "create table test_$a as select generate_series(1, 1000000)::int4 as i"
done

It did create for me these tables:

$ \d
           List of relations
 Schema |    Name     | Type  | Owner  
--------+-------------+-------+--------
 public | test_aeshae | table | depesz
 public | test_aijoox | table | depesz
 public | test_eeyies | table | depesz
 public | test_eicaeb | table | depesz
 public | test_ieveov | table | depesz
 public | test_isoexu | table | depesz
 public | test_jaidaf | table | depesz
 public | test_meijie | table | depesz
 public | test_ohceec | table | depesz
 public | test_okaeph | table | depesz
 public | test_ooghap | table | depesz
 public | test_oojuxo | table | depesz
 public | test_sheeru | table | depesz
 public | test_shofae | table | depesz
 public | test_thaepu | table | depesz
 public | test_uuyohl | table | depesz
 public | test_uyuque | table | depesz
 public | test_wainie | table | depesz
 public | test_yiegah | table | depesz
 public | test_zeecie | table | depesz
(20 rows)

If I'll run the vacuumdb now, normally, it will show me:

=$ vacuumdb -e
vacuumdb: vacuuming database "depesz"
VACUUM;

I.e. just “vacuum" was run, without any table names – it will run database-wide vacuum that will work in one backend, and do all tables sequentially.

I could have influenced order by doing something like:

vacuumdb -e -t test_zeecie -t test_yiegah -t test_wainie -t test_uyuque -t test_uuyohl -t test_thaepu -t test_shofae -t test_sheeru -t test_oojuxo -t test_ooghap -t test_okaeph -t test_ohceec -t test_meijie -t test_jaidaf -t test_isoexu -t test_ieveov -t test_eicaeb -t test_eeyies -t test_aijoox -t test_aeshae
vacuumdb: vacuuming database "depesz"
VACUUM test_zeecie;
VACUUM test_yiegah;
VACUUM test_wainie;
VACUUM test_uyuque;
VACUUM test_uuyohl;
VACUUM test_thaepu;
VACUUM test_shofae;
VACUUM test_sheeru;
VACUUM test_oojuxo;
VACUUM test_ooghap;
VACUUM test_okaeph;
VACUUM test_ohceec;
VACUUM test_meijie;
VACUUM test_jaidaf;
VACUUM test_isoexu;
VACUUM test_ieveov;
VACUUM test_eicaeb;
VACUUM test_eeyies;
VACUUM test_aijoox;
VACUUM test_aeshae;

But, all tables were vacuumed in order anyway.

Now, I can add -j option, and it will parallelize processing:

$ vacuumdb -e -j8
SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns
 WHERE relkind IN ('r', 'm') AND c.relnamespace = ns.oid
 ORDER BY c.relpages DESC;
VACUUM public.test_ohceec;
VACUUM public.test_zeecie;
VACUUM public.test_jaidaf;
VACUUM public.test_aeshae;
VACUUM public.test_ieveov;
VACUUM public.test_yiegah;
VACUUM public.test_shofae;
VACUUM public.test_thaepu;
VACUUM public.test_sheeru;
VACUUM public.test_uuyohl;
VACUUM public.test_eicaeb;
VACUUM public.test_ooghap;
VACUUM public.test_oojuxo;
VACUUM public.test_okaeph;
VACUUM public.test_meijie;
VACUUM public.test_aijoox;
VACUUM public.test_eeyies;
VACUUM public.test_isoexu;
VACUUM public.test_uyuque;
VACUUM public.test_wainie;
VACUUM pg_catalog.pg_proc;
VACUUM pg_catalog.pg_depend;
VACUUM pg_catalog.pg_attribute;
VACUUM pg_catalog.pg_description;
VACUUM pg_catalog.pg_statistic;
VACUUM pg_catalog.pg_operator;
VACUUM pg_catalog.pg_rewrite;
VACUUM pg_catalog.pg_type;
VACUUM pg_catalog.pg_class;
VACUUM information_schema.sql_features;
VACUUM pg_catalog.pg_amop;
VACUUM pg_catalog.pg_amproc;
VACUUM pg_catalog.pg_conversion;
VACUUM pg_catalog.pg_opclass;
VACUUM pg_catalog.pg_index;
VACUUM pg_catalog.pg_aggregate;
VACUUM pg_catalog.pg_collation;
VACUUM pg_catalog.pg_opfamily;
VACUUM pg_catalog.pg_cast;
VACUUM pg_catalog.pg_ts_config_map;
VACUUM pg_catalog.pg_language;
VACUUM pg_catalog.pg_authid;
VACUUM pg_catalog.pg_constraint;
VACUUM pg_catalog.pg_am;
VACUUM pg_catalog.pg_namespace;
VACUUM pg_catalog.pg_database;
VACUUM pg_catalog.pg_tablespace;
VACUUM pg_catalog.pg_pltemplate;
VACUUM pg_catalog.pg_shdepend;
VACUUM pg_catalog.pg_shdescription;
VACUUM pg_catalog.pg_ts_config;
VACUUM pg_catalog.pg_ts_dict;
VACUUM pg_catalog.pg_ts_parser;
VACUUM pg_catalog.pg_ts_template;
VACUUM pg_catalog.pg_extension;
VACUUM pg_catalog.pg_range;
VACUUM information_schema.sql_implementation_info;
VACUUM information_schema.sql_languages;
VACUUM information_schema.sql_parts;
VACUUM information_schema.sql_sizing;
VACUUM information_schema.sql_packages;
VACUUM information_schema.sql_sizing_profiles;
VACUUM pg_catalog.pg_seclabel;
VACUUM pg_catalog.pg_default_acl;
VACUUM pg_catalog.pg_policy;
VACUUM pg_catalog.pg_foreign_table;
VACUUM pg_catalog.pg_foreign_server;
VACUUM pg_catalog.pg_foreign_data_wrapper;
VACUUM pg_catalog.pg_auth_members;
VACUUM pg_catalog.pg_db_role_setting;
VACUUM pg_catalog.pg_event_trigger;
VACUUM pg_catalog.pg_trigger;
VACUUM pg_catalog.pg_largeobject_metadata;
VACUUM pg_catalog.pg_inherits;
VACUUM pg_catalog.pg_attrdef;
VACUUM pg_catalog.pg_enum;
VACUUM pg_catalog.pg_shseclabel;
VACUUM pg_catalog.pg_user_mapping;
VACUUM pg_catalog.pg_largeobject;

As we can see – it initially gets list of tables to vacuum, sorted by relpages – which is basically size of the table.

And then vacuums these in parallel, using many backends. In pg_stat_activity it looks like:

$ select pid, query from pg_stat_activity where pid <> pg_backend_pid()
 pid  |           query            
------+----------------------------
 5981 | VACUUM public.test_ohceec;
 5982 | VACUUM public.test_zeecie;
 5983 | VACUUM public.test_jaidaf;
 5984 | VACUUM public.test_aeshae;
 5985 | VACUUM public.test_ieveov;
 5986 | VACUUM public.test_yiegah;
 5987 | VACUUM public.test_shofae;
 5988 | VACUUM public.test_thaepu;
(8 rows)
 
Nice. Starting with largest tables first is great. Thanks guys.