Which tables should be auto vacuumed or auto analyzed?

Recently I was in a situation where autovacuum couldn't keep up with changes. To solve the problem I finally decided to manually vacuum analyze all tables (manual vacuum/analyze is faster than one ran by autovacuum daemon).

But it irritated me that I didn't have ready way to check which tables are waiting for autovacuum to work on them.

So, I wrote it.

First, let's start with basics. Autovacuum when figuring out whether it should do VACUUM and/or ANALYZE looks at these elements:

Config variable autovacuum_analyze_scale_factor
Config variable autovacuum_analyze_threshold
Config variable autovacuum_vacuum_scale_factor
Config variable autovacuum_vacuum_threshold
Value in reltuples column in pg_class table (which is estimate of number of rows updated by vacuum).
Value in n_dead_tup column in pg_stat_all_tables view (which is number of not yet vacuumed dead rows).
Value in n_mod_since_analyze column in pg_stat_all_tables view (which is number of modified rows since last analyze).

In my test case, config variables are:

=$ select name, setting
from pg_settings
where name in ('autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold')
order by name;
              name               | setting 
---------------------------------+---------
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold    | 50
 autovacuum_vacuum_scale_factor  | 0.2
 autovacuum_vacuum_threshold     | 50
(4 rows)

Rules are simple:

run vacuum if n_dead_tup is larger than reltuples * autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold
run analyze if n_mod_since_analyze is larger than reltuples * autovacuum_analyze_scale_factor + autovacuum_analyze_threshold

Let's do some math. Assuming I have table with 10,000 rows, it will get vacuumed if I delete (or update) 10000 * 0.2 + 50 = 2050 rows. And it will get analyzed if I update (or insert) 10000 * 0.1 + 50 = 1050 rows. Easy. Isn't it?

With math behind, let's see how the values really change. First some test tables:

=$ CREATE TABLE t1 ( id INT4 generated always as identity, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t2 ( id INT4 generated always as identity, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t3 ( id INT4 generated always as identity, payload TEXT, PRIMARY KEY (id));

Immediately after creation stats are obvious:

=$ select c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
from pg_class c left outer join pg_stat_all_tables s on c.oid = s.relid
where c.relname ~ '^t[123]$' order by 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |          0 |                   0
 t2      |         0 |          0 |                   0
 t3      |         0 |          0 |                   0
(3 rows)

OK. Let's load some data:

=$ copy t1 (payload) FROM '/usr/share/dict/words';
COPY 102401
 
=$ copy t2 (payload) FROM '/usr/share/dict/words';
COPY 102401
 
=$ copy t3 (payload) FROM '/usr/share/dict/words';
COPY 102401

Immediately afterwards, with autovacuum disabled, stats are a bit more interesting:

=$ select c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
from pg_class c left outer join pg_stat_all_tables s on c.oid = s.relid
where c.relname ~ '^t[123]$' order by 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |          0 |              102401
 t2      |         0 |          0 |                   0
 t3      |         0 |          0 |                   0
(3 rows)

Please note that reltuples are still 0 because this value is updated by vacuum.

Let's update some rows:

=$ UPDATE t1 SET payload = 'new ' || payload WHERE random() < 0.01;
UPDATE 1068
 
=$ UPDATE t2 SET payload = 'new ' || payload WHERE random() < 0.1;
UPDATE 10135
 
=$ UPDATE t3 SET payload = 'new ' || payload WHERE random() < 0.5;
UPDATE 51028
 
=$ select c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
from pg_class c left outer join pg_stat_all_tables s on c.oid = s.relid
where c.relname ~ '^t[123]$' order by 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |       1068 |              103469
 t2      |         0 |      10135 |               10135
 t3      |         0 |      51028 |               51028
(3 rows)

OK – we see now that n_dead_tup was increased to number of rows updated, and n_mod_since_analyze contains sum of rows inserted and rows updated.

So, finally – delete. But, to be able to see it clearly, first, Let's vacuum/analyze the tables, so counters will be zeroed:

=$ vacuum analyze t1, t2, t3;
VACUUM
 
=$ select c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
from pg_class c left outer join pg_stat_all_tables s on c.oid = s.relid
where c.relname ~ '^t[123]$' order by 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |    102401 |          0 |                   0
 t2      |    102401 |          0 |                   0
 t3      |    102401 |          0 |                   0
(3 rows)
 
=$ DELETE FROM t1 WHERE random() < 0.01;
DELETE 999
 
=$ DELETE FROM t2 WHERE random() < 0.1;
DELETE 10307
 
=$ DELETE FROM t3 WHERE random() < 0.5;
DELETE 51087
 
=$ select c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
from pg_class c left outer join pg_stat_all_tables s on c.oid = s.relid
where c.relname ~ '^t[123]$' order by 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |    102401 |        999 |                 999
 t2      |    102401 |      10307 |               10307
 t3      |    102401 |      51087 |               51087
(3 rows)

Great. Both n_dead_tup and n_mod_since_analyze were increased.

Now, let's write a query that will show us which tables are waiting for autovacuum to work on them.

First – to put all settings in single place, I'll write this CTE:

WITH s as (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  as  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     as  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   as  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      as  vacuum_threshold
)

Then, based on this I can get list of all tables, with information about whether I should vacuum or analyze:

```
=$ with s as (
```
```
    SELECT
```

        current_setting('autovacuum_analyze_scale_factor')::float8  as  analyze_factor,

        current_setting('autovacuum_analyze_threshold')::float8     as  analyze_threshold,

        current_setting('autovacuum_vacuum_scale_factor')::float8   as  vacuum_factor,

        current_setting('autovacuum_vacuum_threshold')::float8      as  vacuum_threshold

```
)
```
```
SELECT
```
```
    n.nspname,
```
```
    c.relname,
```
```
    c.oid as relid,
```
```
    t.n_dead_tup,
```
```
    t.n_mod_since_analyze,
```

    c.reltuples * s.vacuum_factor + s.vacuum_threshold as v_threshold,

    c.reltuples * s.analyze_factor + s.analyze_threshold as a_threshold

```
FROM
```
```
    s,
```
```
    pg_class c
```

    join pg_namespace n on c.relnamespace = n.oid

    join pg_stat_all_tables t on c.oid = t.relid

```
WHERE
```
```
    c.relkind = 'r'
```
```
    AND c.relname ~ '^t[123]$'
```
```
ORDER BY c.relname;
```

 nspname | relname | relid | n_dead_tup | n_mod_since_analyze | v_threshold | a_threshold

---------+---------+-------+------------+---------------------+-------------+-------------

 public  | t1      | 17626 |        999 |                 999 |     20530.2 |     10290.1

 public  | t2      | 17636 |      10307 |              112708 |     20530.2 |     10290.1

 public  | t3      | 17646 |      51087 |              153488 |     20530.2 |     10290.1

```
(3 rows)
```

I included condition (line 23) for relname to avoid showing me all tables in this database when just debugging the query. After it will be done, condition c.relname ~ ‘^t[123]$' will be removed.

Anyway – we have tables, their n_dead_tup, n_mod_since_analyze and thresholds for vacuum and analyze. So,let's show it in nicer way:

=$ with s as (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  as  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     as  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   as  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      as  vacuum_threshold
), tt as (
    SELECT
        n.nspname,
        c.relname,
        c.oid as relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold as v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold as a_threshold
    FROM
        s,
        pg_class c
        join pg_namespace n on c.relnamespace = n.oid
        join pg_stat_all_tables t on c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    case when n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END as do_vacuum,
    case when n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END as do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold
ORDER BY nspname, relname;
 nspname | relname | relid | do_vacuum | do_analyze 
---------+---------+-------+-----------+------------
 public  | t2      | 17636 |           | ANALYZE
 public  | t3      | 17646 | VACUUM    | ANALYZE
(2 rows)

This query can be stored as a view:

=$ create view autovacuum_queue as
with s as (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  as  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     as  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   as  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      as  vacuum_threshold
), tt as (
    SELECT
        n.nspname,
        c.relname,
        c.oid as relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold as v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold as a_threshold
    FROM
        s,
        pg_class c
        join pg_namespace n on c.relnamespace = n.oid
        join pg_stat_all_tables t on c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    case when n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END as do_vacuum,
    case when n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END as do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold;
CREATE VIEW

And now, I can see this, with (possibly) some additional info:

=$ select *,
    pg_relation_size(relid),
    pg_total_relation_size(relid)
from autovacuum_queue
    order by pg_total_relation_size(relid) desc
;
 nspname | relname | relid | do_vacuum | do_analyze | pg_relation_size | pg_total_relation_size 
---------+---------+-------+-----------+------------+------------------+------------------------
 public  | t3      | 17646 | VACUUM    | ANALYZE    |          7143424 |               11804672
 public  | t2      | 17636 |           | ANALYZE    |          5128192 |                8011776
(2 rows)

Hope you'll find it useful.

4 thoughts on “Which tables should be auto vacuumed or auto analyzed?”

Nikolay Samokhvalov says:

2020-01-30 at 02:00

Great!!

I tend to feel, this kind of thing should be provided by Postgres itself.

Some time ago I also needed this. It turned out that colleagues from Avito.ru already built something some time ago, so I took their work and slightly improved, here it is: https://gitlab.com/snippets/1889668. It has various details, but it doesn’t deal with autoanalyze part.

I also find it useful to put this into monitoring – for example, for GitLab.com it is exported to Prometheus: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8494#note_251487637.

Observing the size of “wait queue”, keeping in mind the value of autovacuum_max_workers, can be helpful for autovacuum tuning.
Gowtham Raj says:

2020-01-30 at 07:51

Thanks for this amazing query. I ran this query couple of times and noticed that it was not showing any output, however pg_stat_all_tables recorded autovacuum happening. Then I found that i had set the autovacuum settings at table level.
Does this query cover that too ? Is my under standing correct ? please correct me If i am wrong.
depesz says:

2020-01-30 at 13:27

@Gowtham:

No, the query doesn’t handle per-table settings. Mostly because I didn’t think of it while writing the query. It should be possible to add it, though.
depesz says:

2020-01-30 at 13:52

@Nikolay:

For monitoring I prefer to simply check number of concurrent backends doing autovacuum (simple “query ~ ‘^autovacuum:'” is enough) and compare it against autovacuum_max_workers. If the numbers are the same for extended period of time – there is a problem.

Comments are closed.