December 20th, 2011 by depesz | Tags: , , , , | 3 comments »
Did it help? If yes - maybe you can help me?

If you have production DB servers, chances are you're running variant of these queries:

SELECT sum(pg_relation_size(oid)) from pg_class where relkind = 'i'
SELECT sum(pg_relation_size(oid)) from pg_class where relkind = 'r'

To get summarized size of tables and/or indexes in your database (for example for graphing purposes).

This (getting pg_relation_size for rows in pg_class) has one problem – it can lock, or it can fail.

Let's see those problems. First – locking.

We will need two psql sessions, and in them:

  • Session #1: create table z (i int4);
  • Session #1: begin;
  • Session #1: drop table z;
  • Session #2: SELECT sum(pg_relation_size(oid)) from pg_class where relkind = ‘r';

and we can see that session #2 hangs until Session #1 will commit or rollback.

If it will rollback – we'll get the size. But if Session #1 will commit, in Session #2 we'll get:

ERROR:  could not open relation with OID 18695

Of course the OID can be different.

That far from good.

So, since I encountered this problem, I thought that there has to be better solution – perhaps not as elegant, but at the very least not breaking every now and then.

Today, I finally got fed up with alert mails about broken queries, and wrote this:

SELECT
    c.oid::regclass as relation_name,
    c.relkind as relation_kind,
    x.file_size as relation_size
FROM
    pg_class c
    join (
        SELECT
            regexp_replace( file_name, '[^0-9].*', '') as filenode,
            sum( ( pg_stat_file( path || '/' || file_name ) ).size ) as file_size
        FROM
            (
                SELECT
                    path || '/' || file_name as path,
                    pg_ls_dir( path || '/' || file_name) as file_name
                FROM
                    (
                        SELECT
                            location as path,
                            pg_ls_dir( location ) as file_name
                        FROM
                            (
                                SELECT
                                    'pg_tblspc/' || t ||
                                    case
                                        when current_setting( 'server_version_num' )::INT4 >= 90000 THEN
                                        '/' || (
                                            SELECT q
                                            FROM pg_ls_dir('pg_tblspc/' || t) as q
                                            WHERE (pg_stat_file( 'pg_tblspc/' || t||'/'||q)).isdir='true'
                                                AND q like 'PG_' || regexp_replace( current_setting('server_version'), E'^([0-9]+\\.[0-9]+).*$', E'\\1') || '%'
                                        )
                                    ELSE
                                        ''
                                    END
                                    as location
                                FROM
                                    pg_ls_dir('pg_tblspc') as t
                                WHERE
                                    (pg_stat_file('pg_tblspc/' || t)).isdir = 'true'
                                union
                                SELECT 'base'
                            ) as dirs
                    ) as db_dirs
                WHERE
                    file_name = ( SELECT oid::TEXT FROM pg_database WHERE datname = current_database() )
            ) as file_list
        GROUP BY
            filenode
    ) as x ON c.relfilenode::TEXT = x.filenode
;

It's not nice. It doesn't have the charm of simple: select pg_relation_size(). But it works.

Result of this query is list of all relations, their types ( the most interesting are “r" – relation, “i" – index, “t" – toast table ), and size.

It works on 8.3+ (possibly on 8.2 too, but I didn't check).

In normal circumstances it's most likely slower than normal pg_relation_size() calls, but this has the benefit of not locking/failing. Which is a huge deal for me.

The major drawback of this solution is that it needs to be ran using superuser account (due to usage of pg_ls_dir() and pg_stat_file() functions). But that shouldn't be a big problem – after all it's just for monitoring.

  1. 3 comments

  2. # Keith
    Dec 20, 2011

    Put this query into a function and gave it SECURITY DEFINER and that allows non-superusers to get this data.

    Thanks Depesz!

  3. # Josh Kupershmidt
    Dec 21, 2011

    Hi Depesz,
    This is quite clever, thanks for sharing!

    I think the problem with the SELECT sum(…) query bailing out with “ERROR: could not open …” could be worked around by making a PL/pgSQL wrapper around the call to pg_total_relation_size(). Something like this:

    CREATE OR REPLACE FUNCTION
    no_err_pg_total_relation_size(v_relation regclass)
    RETURNS bigint AS
    $$
    DECLARE
    v_size bigint;
    BEGIN
    BEGIN
    SELECT pg_total_relation_size(v_relation)
    INTO v_size;
    EXCEPTION
    WHEN internal_error THEN
    RAISE WARNING ‘Caught error: %’, SQLERRM;
    return 0::bigint;
    END;

    RETURN v_size;
    END;

    $$ LANGUAGE plpgsql;

    Of course, this wouldn’t help with the blocking against a concurrent DROP TABLE or similar – your solution is necessary for that.

  4. # Tomas Vondra
    Dec 21, 2011

    Another option is not to use pg_relation_size at all, and use (relpages * 8192) or whatever block size you use. No, it’s not updated immediately (only when the table is analyzed), but in most cases this should not be an issue.

Leave a comment