February 7th, 2012 by depesz | Tags: , , , , , , , , | 1 comment »
Did it help? If yes - maybe you can help me?

On 26th of January, Magnus Hagander committed patch:

Add counters for number and size of temporary files used
for spill-to-disk queries for each database to the
pg_stat_database view.
 
Tomas Vondra, review by Magnus Hagander

As you perhaps know certain operations in PostgreSQL can create temporary files. I am not talking about temporary tables – these files are created when operation done by PostgreSQL backend needs to use more memory than it can use, due to limits of work_mem.

So, let's see how that counter works.

First, this is how pg_stat_database looks like now:

$ select * from pg_stat_database;
 datid │  datname  │ numbackends │ xact_commit │ xact_rollback │ blks_read │ blks_hit │ tup_returned │ tup_fetched │ tup_inserted │ tup_updated │ tup_deleted │ conflicts │ temp_files │ temp_bytes │ deadlocks │          stats_reset
───────┼───────────┼─────────────┼─────────────┼───────────────┼───────────┼──────────┼──────────────┼─────────────┼──────────────┼─────────────┼─────────────┼───────────┼────────────┼────────────┼───────────┼───────────────────────────────
 12039 │ template0 │           00000000000000[null]
 16387 │ depesz    │           2246226180637173824552610000002012-02-07 12:29:59.048977+01
 16388 │ jab       │           02880679041878772218386637651305153100002012-02-07 12:29:59.366588+01
 16389 │ pgdba     │           02250105655569964164256000002012-02-07 12:30:21.039997+01
 12047 │ postgres  │           0253014968417074317642718000002012-02-07 12:29:56.597627+01
     1 │ template1 │           02451104652670626164705000002012-02-07 12:29:56.606047+01
(6 rows)

The row is quite wide, so let's get just one row of data, in extended format:

$ select * from pg_stat_database where datname = 'depesz';
─[ RECORD 1 ]─┬──────────────────────────────
datid         │ 16387
datname       │ depesz
numbackends   │ 1
xact_commit   │ 3
xact_rollback │ 0
blks_read     │ 0
blks_hit      │ 12
tup_returned  │ 4
tup_fetched   │ 4
tup_inserted  │ 0
tup_updated   │ 0
tup_deleted   │ 0
conflicts     │ 0
temp_files    │ 0
temp_bytes    │ 0
deadlocks     │ 0
stats_reset   │ 2012-02-07 14:20:14.514258+01

My PostgreSQL is configured to use 1MB of RAM, so if I'll ask it to do something that fits in this amount of ram, it will not use temp files:

$ explain analyze select count(*) from (select random() as i from generate_series(1,1000) order by i) as x;
                                                           QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=77.33..77.34 rows=1 width=0) (actual time=1.308..1.308 rows=1 loops=1)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=0) (actual time=1.099..1.244 rows=1000 loops=1)
         Sort Key: (random())
         Sort Method: quicksort  Memory: 71kB
         ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=0.184..0.380 rows=1000 loops=1)
 Total runtime: 1.449 ms
(6 rows)

As you can see sort used only RAM, and so the temp_* counters are still at zero:

$ select temp_files, temp_bytes from pg_stat_database where datname = 'depesz';
 temp_files │ temp_bytes
────────────┼────────────
          00
(1 row)

But when I'll do something more demanding:

$ explain analyze select count(*) from (select random() as i from generate_series(1,20000) order by i) as x;
                                                            QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=77.33..77.34 rows=1 width=0) (actual time=31.668..31.669 rows=1 loops=1)
   ->  Sort  (cost=62.33..64.83 rows=1000 width=0) (actual time=27.529..30.368 rows=20000 loops=1)
         Sort Key: (random())
         Sort Method: external merge  Disk: 352kB
         ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=4.822..10.117 rows=20000 loops=1)
 Total runtime: 32.248 ms
(6 rows)
$ select temp_files, temp_bytes from pg_stat_database where datname = 'depesz';
 temp_files │ temp_bytes
────────────┼────────────
          2640448
(1 row)

It is interesting to see that temp_bytes is ~ 2x what explain analyze showed. I checked Pg logs, and saw:

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp22791.1", size 360448
STATEMENT:  explain analyze select count(*) from (select random() as i from generate_series(1,20000) order by i) as x;
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp22791.0", size 280000
STATEMENT:  explain analyze select count(*) from (select random() as i from generate_series(1,20000) order by i) as x;
LOG:  duration: 32.506 ms  statement: explain analyze select count(*) from (select random() as i from generate_series(1,20000) order by i) as x;

So the amount that temp_* columns showed is correct, but explain analyze showed just part of it – my assumption is that PostgreSQL used first 352kB of temp file, and then rewrote it to 280kB of file, but at the same it did remote the 352 kB file – so explain shows peak usage, not total usage.

If I'll run the explain again, and check pg_stat_database one more time, I can see:

$ select temp_files, temp_bytes from pg_stat_database where datname = 'depesz';
 temp_files │ temp_bytes
────────────┼────────────
          41280896
(1 row)

Which clearly shows that values in these columns are always incrementing – rolling counter of number of created temp files, and total size of all created temp files.

So it is not a tool to see “what is the current state" – more along the lines of: get the data to some graphing tool, so you can see trends.

This information was previously available, in logs, but now we got easier and faster access to this data, which is always a good thing.

  1. One comment

  2. Feb 8, 2012

    I guess this is good… kind of feels like they decided to make it easier without making it better. I could see all kinds of poor assumptions being made based on the information given inside the database; I think it would have been better to add database identifying information into the logs.

Leave a comment