Waiting for 8.4 – pg_stat_statements

On 4th of January, Tom Lane committed patch by Takahiro Itagaki, which adds new contrib module – pg_stat_statements:

Log Message:
-----------
Add contrib/pg_stat_statements for server-wide tracking of statement execution
statistics.
 
Takahiro Itagaki

What is it for? Well, actually, it takes some work away from projects like pgFouine or my own analyze.pgsql.logs.pl.

To this day, when you want/need to see some query statistics you have to log all queries, and then run some software that will parse it, normalize queries, and run some kind of statistics on it.

Now, the log-parse part is no longer neccessary.

This is how it works.

First, you have to modify your postgresql.conf. Open it, and find line with shared_preload_libraries parameter. Add there pg_stat_statements, like this:

shared_preload_libraries = 'pg_stat_statements'     # (change requires restart)

As you can see in the comment, change here requires server restart. Before we will do it, let's add some more options to .conf file:

pg_stat_statements.max = 100
pg_stat_statements.track = top
pg_stat_statements.save = off

To make it work, we will also need to add “pg_stat_statements" to “custom_variable_classes" option, which is usually empty, but if you have one already defined, just add new one like this:

custom_variable_classes = 'depesz,pg_stat_statements'       # list of custom variable class names

Afterwards, we can restart PostgreSQL.

Now, tracking is enabled, but to view stats, we have to create appropriate functions and view, by loading pg_stat_statements.sql to any database:

# \i WORK/share/postgresql/contrib/pg_stat_statements.sql
SET
CREATE FUNCTION
CREATE FUNCTION
CREATE VIEW
GRANT
REVOKE

( of course your path might be different ).

So, let's see how it works. First, let's check if (just after connection) the stats are empty:

# SELECT * FROM pg_stat_statements;
 userid | dbid | query | calls | total_time | ROWS
--------+------+-------+-------+------------+------
(0 ROWS)

So, now let's repeat last query:

# SELECT * FROM pg_stat_statements;
 userid | dbid  |               query               | calls | total_time | ROWS
--------+-------+-----------------------------------+-------+------------+------
     10 | 16389 | SELECT * FROM pg_stat_statements; |     1 |   0.000131 |    0
(1 ROW)

Wow! It works.

Now, let's reset stats (select pg_stat_statements_reset();) and do some more tests:

(pgdba@[LOCAL]:5840) 15:42:41 [pgdba]
# SELECT 1 + 2;
 ?COLUMN?
----------
        3
(1 ROW)
 
(depesz@[LOCAL]:5840) 15:40:39 [depesz]
# SELECT 2 + 3;
 ?COLUMN?
----------
        5
(1 ROW)
 
(depesz@[LOCAL]:5840) 15:43:13 [depesz]
# SELECT COUNT(*) FROM pg_class WHERE relkind = 'r';
 COUNT
-------
    50
(1 ROW)

And, how do stats look like now?

# SELECT * FROM pg_stat_statements;
 userid | dbid  |                       query                        | calls | total_time | ROWS
--------+-------+----------------------------------------------------+-------+------------+------
  16384 | 16388 | SELECT COUNT(*) FROM pg_class WHERE relkind = 'r'; |     1 |   0.000271 |    1
     10 | 16389 | SELECT 1 + 2;                                      |     1 |    1.9e-05 |    1
  16384 | 16388 | SELECT 2 + 3;                                      |     1 |    2.2e-05 |    1
     10 | 16389 | SELECT pg_stat_statements_reset();                 |     1 |    3.3e-05 |    1
(4 ROWS)

Nice. And how does it work with prepared statements?

# SELECT pg_stat_statements_reset();
 pg_stat_statements_reset
--------------------------
 
(1 ROW)
 
# PREPARE x(int4, int4) AS SELECT $1 + $2;
PREPARE
# EXECUTE x(1,2);
 ?COLUMN?
----------
        3
(1 ROW)
 
# EXECUTE x(2,3);
 ?COLUMN?
----------
        5
(1 ROW)
 
(pgdba@[LOCAL]:5840) 15:45:54 [pgdba]
# PREPARE y(int4, int4) AS SELECT $1 + $2;
PREPARE
(pgdba@[LOCAL]:5840) 15:46:00 [pgdba]
# EXECUTE y(3,4);
 ?COLUMN?
----------
        7
(1 ROW)
 
(pgdba@[LOCAL]:5840) 15:46:05 [pgdba]
# SELECT * FROM pg_stat_statements;
 userid | dbid  |                  query                   | calls | total_time | ROWS
--------+-------+------------------------------------------+-------+------------+------
     10 | 16389 | PREPARE y(int4, int4) AS SELECT $1 + $2; |     1 |    1.7e-05 |    1
     10 | 16389 | SELECT pg_stat_statements_reset();       |     1 |    3.4e-05 |    1
     10 | 16389 | PREPARE x(int4, int4) AS SELECT $1 + $2; |     2 |    3.3e-05 |    2
(3 ROWS)

Interesting. It looks like “prepare" is shown to be called as many times, as it was executed. Besides this little point – it looks great.

Now. I configured pg_stat_statements to keep tabs on 100 different queries. What will happen after 100th query? Which one will be removed?

This simple one-liner will add 100 different queries:

( echo "SELECT pg_stat_statements_reset();"; FOR a IN $( seq 1 99 ); do echo "select $a;"; done ) | psql
# SELECT COUNT(*) FROM pg_stat_statements;
 COUNT
-------
   100
(1 ROW)

BUT the query: “select count(*) from pg_stat_statements" was also added. So some query had to be removed from pg_stat_statements. Or perhaps count(*) was not added to stats? Let's find out:

# SELECT * FROM pg_stat_statements ORDER BY query;
...

Apparently, pg_stat_statements will remove random record, with lowest number of calls. I.e. if there are 100 records, each with calls = 1 – you can't find out which row will be removed from stats when new query will arrive. But, generally it's not a big issue.

To sum it up. I think that functionality of the module would be greatly enhanced if it would store queries without parameters (instead of “select 2 + 3" -> “select $1 + $2", or something like this) – otherwise, with real-world databases, the buffer for queries will fill up too soon, and it will not “catch" the fact that “select * from table where id = 3" and “select * from table where id = 23" is practically the same.

But, since there are already tools for doing this kind of analysis, this new addition (pg_stat_statements) is welcome, as it can be used with smaller systems.

13 thoughts on “Waiting for 8.4 – pg_stat_statements”

  1. Thanks for explaining this, is very helpful… any hints about the impact in performance?

  2. @vcardenas:
    Well, I tested it for only couple of runs of pgbench, but it looks like there is practically none overhead. I.e. technically there has to be one, but the difference of speed of pgbench was bigger because of random fluctuations of load on the machine (my laptop) than from turning on pg_stat_statements.

  3. Hi,

    I really love your site, but the black backround is extremely hard to read.
    Why don’t you use a white background and black font as everybody else does?

    I’m glad I can turn off CSS using Firefox so I can at least read the full article without getting a headache

  4. @Thomas:
    hi,
    well, the most important reason is that I personally *hate* white backgrounds.

    Also – did you notice theme switcher in sidebar?

  5. depesz,
    theme switcher switches the theme but takes you back to the main page.
    and when you click article title, it takes you to the article but switches the theme back to the default.
    (debian/firefox3)

  6. I think you should add an extra timestamp column to your stats table so that you can remove the oldest records first instead of random ones. Moreover, this column could be very useful for stats info (e.g. how much time did elapse between the first and last statement of a block of statements?).

  7. @Ptomaine:
    sorry, but what “my” stats table?

    if you mean pg_stat_statements – it’s not mine – it’s postgresql, and the suggestion might be sensible to send to pg-general, or even pg-bugs mailing list.

  8. @Sergey Konoplev:
    Unfortunately it’s not the case. Please check this:

    create table x (i int4);

    insert into x (i) select generate_series(1,10);

    select * from x where i = 1;

    select * from x where i = 2;

    # SELECT * FROM pg_stat_statements;
     userid | dbid  |                      query                      | calls | total_time | ROWS
    --------+-------+-------------------------------------------------+-------+------------+------
      16384 | 16390 | SELECT * FROM x WHERE i = 1;                    |     1 |    5.6e-05 |    1
      16384 | 16390 | INSERT INTO x (i) SELECT generate_series(1,10); |     1 |   0.000294 |   10
      16384 | 16390 | SELECT * FROM x WHERE i = 2;                    |     1 |    4.4e-05 |    1
    (3 ROWS)

    it works in this was with prepared statements, but not with typed-in queries.

  9. well, i want use this interesting stat tool in 8.3.7, but only support on 8.4. How i can do this?

  10. @zhle521:
    upgrade postgresql to newer version.

    or extract the patch from git repository, and try to modify it so it will be possible to apply on 8.3 source tree.

  11. thks, i try, but too many dependencies on new version source,. So, i gave it up and found another solution to the problem.

  12. If you’re using a newer version of PostgreSQL, rather than the old-fashioned way of loading this extension:

    \i work/share/postgresql/contrib/pg_stat_statements.sql
    

    Use instead the new CREATE EXTENSION feature.

    CREATE EXTENSION pg_stat_statements
    

    This has tab-completion support and is a quick way to see what extensions are included in your build / package.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.