Which tables should be auto vacuumed or auto analyzed – UPDATE

Some time ago I wrote blogpost which showed how to list tables that should be autovacuumed or autoanalyzed.

Query in there had one important problem – it didn't take into account per-table settings.

Specifically – it only used system-wide values for:

  • autovacuum_analyze_scale_factor
  • autovacuum_analyze_threshold
  • autovacuum_vacuum_scale_factor
  • autovacuum_vacuum_threshold

but these can be also set per table using syntax like:

=$ ALTER TABLE t2 SET ( autovacuum_vacuum_scale_factor = 0.3, autovacuum_vacuum_threshold = 200 );

Let's fix it.

To get per-table settings you have to look into reloptions column in pg_class table.

Appropriate query that gets real values of factors and thresholds looks like:

  1. =$ SELECT
  2.     n.nspname,
  3.     c.relname,
  4.     c.oid AS relid,
  5.     c.reltuples,
  6.     s.n_dead_tup,
  7.     s.n_mod_since_analyze,
  8.     COALESCE(
  9.         (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_scale_factor=' ),
  10.         current_setting('autovacuum_analyze_scale_factor')
  11.     )::float8 AS autovacuum_analyze_scale_factor,
  12.     COALESCE(
  13.         (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_threshold=' ),
  14.         current_setting('autovacuum_analyze_threshold')
  15.     )::float8 AS autovacuum_analyze_threshold,
  16.     COALESCE(
  17.         (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ),
  18.         current_setting('autovacuum_vacuum_scale_factor')
  19.     )::float8 AS autovacuum_vacuum_scale_factor,
  20.     COALESCE(
  21.         (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_threshold=' ),
  22.         current_setting('autovacuum_vacuum_threshold')
  23.     )::float8 AS autovacuum_vacuum_threshold
  24. FROM
  25.     pg_class c
  26.     JOIN pg_namespace n ON c.relnamespace = n.oid
  27.     LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
  28. WHERE c.relkind = 'r'
  29. ORDER BY n.nspname, c.relname;

The magic to get appropriate values in in lines 8-23, but it's just 4 separate blocks, one for each variable. Let's look at one of them:

  1. COALESCE(
  2.     (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_scale_factor=' ),
  3.     current_setting('autovacuum_analyze_scale_factor')
  4. )::float8 AS autovacuum_analyze_scale_factor

SELECT in line 2 unpacks reloptions (it's array of texts), and gets only the row that starts with “autovacuum_analyze_scale_factor=“. Then, value is split using = character, and it returns second part – which is the value.

If reloptions do not contain autovacuum_analyze_scale_factor, then the whole subselect in line 2 will return NULL, which will then get passed to COALESCE, which will substitute global value of autovacuum_analyze_scale_factor using current_setting call.

Finally in line 4 I added cast to float8, so that the value will be useful for math.

In my case, with these tables:

 relname │                                                                            reloptions                                                                            
─────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 t1      │ {autovacuum_enabled=off,autovacuum_analyze_scale_factor=0.2,autovacuum_analyze_threshold=100}
 t2      │ {autovacuum_enabled=off,autovacuum_vacuum_scale_factor=0.3,autovacuum_vacuum_threshold=200}
 t3      │ {autovacuum_enabled=off,autovacuum_analyze_scale_factor=0.2,autovacuum_analyze_threshold=100,autovacuum_vacuum_scale_factor=0.3,autovacuum_vacuum_threshold=200}
 t4      │ [NULL]
(4 ROWS)

and these global settings:

              name               │ setting 
─────────────────────────────────┼─────────
 autovacuum_analyze_scale_factor │ 0.1
 autovacuum_analyze_threshold    │ 50
 autovacuum_vacuum_scale_factor  │ 0.2
 autovacuum_vacuum_threshold     │ 50
(4 ROWS)

The query from above shows:

 nspname │ relname │ relid │ reltuples │ n_dead_tup │ n_mod_since_analyze │ autovacuum_analyze_scale_factor │ autovacuum_analyze_threshold │ autovacuum_vacuum_scale_factor │ autovacuum_vacuum_threshold 
─────────┼─────────┼───────┼───────────┼────────────┼─────────────────────┼─────────────────────────────────┼──────────────────────────────┼────────────────────────────────┼─────────────────────────────
 public  │ t1      │ 51871010521034530.21000.250
 public  │ t2      │ 518810101281125290.1500.3200
 public  │ t3      │ 518910513601537610.21000.3200
 public  │ t4      │ 51901102401000.1500.250
(4 ROWS)

Now, with this in place, I can rewrite view from original post to:

=$ CREATE VIEW autovacuum_queue AS
WITH s AS (
    SELECT
    n.nspname,
    c.relname,
    c.oid AS relid,
    c.reltuples,
    s.n_dead_tup,
    s.n_mod_since_analyze,
    COALESCE(
        (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_scale_factor=' ),
        current_setting('autovacuum_analyze_scale_factor')
    )::float8 AS analyze_factor,
    COALESCE(
        (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_analyze_threshold=' ),
        current_setting('autovacuum_analyze_threshold')
    )::float8 AS analyze_threshold,
    COALESCE(
        (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ),
        current_setting('autovacuum_vacuum_scale_factor')
    )::float8 AS vacuum_factor,
    COALESCE(
        (SELECT split_part(x, '=', 2) FROM unnest(c.reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_threshold=' ),
        current_setting('autovacuum_vacuum_threshold')
    )::float8 AS vacuum_threshold
    FROM
        pg_class c
        JOIN pg_namespace n ON c.relnamespace = n.oid
        LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
    WHERE
        c.relkind = 'r'
), tt AS (
    SELECT
        nspname,
        relname,
        relid,
        n_dead_tup,
        n_mod_since_analyze,
        reltuples * vacuum_factor + vacuum_threshold AS v_threshold,
        reltuples * analyze_factor + analyze_threshold AS a_threshold
    FROM
        s
)
SELECT
    nspname,
    relname,
    relid,
    CASE WHEN n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END AS do_vacuum,
    CASE WHEN n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END AS do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold;

and with this, I can:

=$ SELECT *,
    pg_relation_size(relid),
    pg_total_relation_size(relid)
FROM autovacuum_queue
    ORDER BY pg_total_relation_size(relid) DESC
;
  nspname   │   relname    │ relid │ do_vacuum │ do_analyze │ pg_relation_size │ pg_total_relation_size 
────────────┼──────────────┼───────┼───────────┼────────────┼──────────────────┼────────────────────────
 public     │ t3           │ 51891 │ VACUUM    │ ANALYZE    │          715980811812864
 public     │ t2           │ 51881 │ VACUUM    │ ANALYZE    │          51281927987200
 public     │ t1           │ 51871 │ VACUUM    │ ANALYZE    │          46776327020544
 pg_catalog │ pg_statistic │  2619 │           │ ANALYZE    │           196608344064
(4 ROWS)

Hope it helps.

One thought on “Which tables should be auto vacuumed or auto analyzed – UPDATE”

  1. pg_statistic gets in there (for “ANALYZE”) — should be excluded, I guess.

Comments are closed.