Logging queries – how?

One of the questions that pop up frequently on IRC is how to see queries are now executed on the server, and what queries were earlier.

Theoretically answer to this is simple – pg_stat_activity and log_min_duration_statement. Or log_statement. What is the difference? That's exactly why I'm writing this post.

First of all – to see what's happening in database right now – you can use:

$ SELECT * FROM pg_stat_activity ;
-[ RECORD 1 ]----+---------------------------------
datid            | 16398
datname          | depesz
procpid          | 28765
usesysid         | 16387
usename          | depesz
application_name | psql
client_addr      | [NULL]
client_port      | -1
backend_start    | 2010-07-21 15:44:08.017051+02
xact_start       | 2010-07-21 15:46:13.92332+02
query_start      | 2010-07-21 15:46:13.92332+02
waiting          | f
current_query    | SELECT * FROM pg_stat_activity ;

Exact description of what's what, you can find in the fine manual..

There couple of important things, though:

  • current_query truncates long queries to some size (don't recall how long are OK, and it's definitely larger than normally used queries, but the limit exists)
  • sometimes field ‘current_query' can contain string ‘‘ – this happens when you're querying pg_stat_activity from not superuser, and some queries are run by different user than you are logged as

Aside from these small issues, this is perfectly good way to view current state of system. Especially when adding where to filter out idling connections. My favorite query for this view is:

SELECT
    now() - query_start AS runtime,
    CASE WHEN waiting THEN 'WAIT' ELSE '' END AS wait,
    current_query
FROM
    pg_stat_activity
WHERE
    current_query <> '<IDLE>'
ORDER BY 1 DESC

It shows working queries, without idles, with their respective runtime, sorted from longest running, with information whether the query is waiting for something.

Now, what about seeing all queries?

Well, we need to use one of GUCs: log_statement or log_min_duration_statement.

First, let's see at log_statement. It can have 4 values: none, ddl, mod and all.

These mean:

  • none – none statements will be logged (by mechanisms of log_statement – they can still be logged by something else)
  • ddl – statements like create, alter and drop will be logged – i.e. all that change schema of database
  • mod – logs all queries that “ddl" would log, plus insert, update, delete, truncate and ‘copy from' statements
  • all – logs all queries

There also are some important notices:

  • if you call a function (select function()) and it internally does some modifications – it will not be logged with log_statement <> all
  • queries are logged before they are actually ran (which is pretty cool, but it's important to notice the difference from the way described below

Let's see how it looks (that's from csv logs, scroll to the right to see important parts):

2010-07-21 21:54:16.135 CEST,"depesz","depesz",18160,"[local]",4c475062.46f0,3,"idle",2010-07-21 21:54:10 CEST,2/4,0,LOG,00000,"statement: create table x (i int4);",,,,,,,,,"psql"
2010-07-21 21:54:21.344 CEST,"depesz","depesz",18160,"[local]",4c475062.46f0,4,"idle",2010-07-21 21:54:10 CEST,2/5,0,LOG,00000,"statement: insert into x (i) values (123);",,,,,,,,,"psql"
2010-07-21 21:54:24.094 CEST,"depesz","depesz",18160,"[local]",4c475062.46f0,5,"idle",2010-07-21 21:54:10 CEST,2/6,0,LOG,00000,"statement: select * from x;",,,,,,,,,"psql"

And finally, log_min_duration_statement. This is (in my opinion) the most useful of them all. If I turn it on (i.e. set to >= 0 value), and run the same queries, in logs I can see:

2010-07-21 22:03:27.690 CEST,"depesz","depesz",18723,"[local]",4c475288.4923,4,"CREATE TABLE",2010-07-21 22:03:20 CEST,2/0,0,LOG,00000,"duration: 45.924 ms  statement: create table x (i int4);",,,,,,,,,"psql"
2010-07-21 22:03:31.236 CEST,"depesz","depesz",18723,"[local]",4c475288.4923,5,"INSERT",2010-07-21 22:03:20 CEST,2/0,0,LOG,00000,"duration: 12.251 ms  statement: insert into x (i) values (123);",,,,,,,,,"psql"
2010-07-21 22:03:33.724 CEST,"depesz","depesz",18723,"[local]",4c475288.4923,6,"SELECT",2010-07-21 22:03:20 CEST,2/0,0,LOG,00000,"duration: 0.307 ms  statement: select * from x;",,,,,,,,,"psql"

Important addition: duration: … elements, which show how long given statement took.

This means that statements (and durations) logged by log_min_duration_statement can be logged only after execution. On the flip side – log_min_duration gives you ability to log queries that took over some defined time – for example 5ms, or 15 seconds, so you can tune it to log whatever is too long for your application.

One note though. If you'll enable both log_statement (to some value) and log_min_duration_statement, and some query will match both cases, you'll see:

2010-07-21 22:12:58.423 CEST,"depesz","depesz",19197,"[local]",4c4754c6.4afd,5,"idle",2010-07-21 22:12:54 CEST,2/3,0,LOG,00000,"statement: create table x (i int4);",,,,,,,,,"psql"
2010-07-21 22:12:58.437 CEST,"depesz","depesz",19197,"[local]",4c4754c6.4afd,6,"CREATE TABLE",2010-07-21 22:12:54 CEST,2/0,0,LOG,00000,"duration: 14.017 ms",,,,,,,,,"psql"
2010-07-21 22:13:00.253 CEST,"depesz","depesz",19197,"[local]",4c4754c6.4afd,7,"idle",2010-07-21 22:12:54 CEST,2/4,0,LOG,00000,"statement: insert into x (i) values (123);",,,,,,,,,"psql"
2010-07-21 22:13:00.255 CEST,"depesz","depesz",19197,"[local]",4c4754c6.4afd,8,"INSERT",2010-07-21 22:12:54 CEST,2/0,0,LOG,00000,"duration: 2.140 ms",,,,,,,,,"psql"
2010-07-21 22:13:04.594 CEST,"depesz","depesz",19197,"[local]",4c4754c6.4afd,9,"SELECT",2010-07-21 22:12:54 CEST,2/0,0,LOG,00000,"duration: 0.285 ms  statement: select * from x;",,,,,,,,,"psql"

That is – if the query will not be logged by ‘log_statement', and only by log_min_duration_statement – it will be logged as previously “duration: … ms statement: …".

But, if the query got logged by log_statement mechanisms (like create table or insert above) – the line logged by log_min_duration_statement will contain only duration, and not statement itself. Reason is very simple – not to repeat the same information.

This unfortunately means that if you use both log_statement and log_min_duration_statement – you need some way to join these lines – with PID or session id. When you use normal stderr logging – you need to add this matching information to log_line_prefix, but when using csvlog – it's already there.

All in all – mastering these settings is the way to understand what's happening with your server. pg_stat_activity for quick checks of state “right here, right now", and log_* for long time analysis.

One thought on “Logging queries – how?”

Comments are closed.