One database that I am monitoring uses a lot of stored procedures. Some of them are fast, some of them are not so fast. I thought – is there a sensible way to diagnose which part of stored procedure take the most time?

I mean – I could just put the logic into application, and then every query would have it's own timing in Pg logs, but this is not practical. And I also believe that using stored procedures/functions is way better than using plain SQL due to a number of reasons.

So, I'm back to question – how to check which part of function takes most of the time?

Let's assume we have very simplistic function:

CREATE OR REPLACE FUNCTION testit() RETURNS void AS $$
DECLARE
x INT4;
y INT4;
BEGIN
x := 3;
y := x * 2;
RETURN;
END;
$$ LANGUAGE plpgsql;

This functions doesn't do anything, but that's my point – I want to measure how long does it take to add profiling, and how fast this profiling is.

Generally, the idea is to put RAISE “something" that will log information to logs, without polluting client application with useless information.

Before I will go to this, let's get a base point – let's run this function – let's say 10000 times, and check time:

# SELECT count(*) FROM (
SELECT testit() from generate_series(1,10000)
) as x;
count
-------
10000
(1 row)
 
Time: 24.486 ms

Oops. A bit too fast to be sensibly comparable. Let's add some more iterations:

# SELECT count(*) FROM (
SELECT testit() from generate_series(1,1000000)
) as x;
count
---------
1000000
(1 row)
 
Time: 2801.626 ms

Better.

I ran it also 2 more times, and got results:

  • 2766.251 ms
  • 3153.510 ms

OK, So now, let's add profiling. First I need to find a log level that will go to logs, and not on screen:

# show client_min_messages ;
client_min_messages
---------------------
notice
(1 row)
 
Time: 0.093 ms
(depesz@[local]:5850) 11:26:55 [depesz]
# show log_min_messages ;
log_min_messages
------------------
warning
(1 row)

Now, thanks to fine documentation we can see that there is ‘LOG' log-level that will get logged but not shown on screen.

So, knowing this, let's add some profiling. It should be as simple as possible, yet have all necessary information:

CREATE OR REPLACE FUNCTION testit() RETURNS void AS $$
declare
x INT4;
y INT4;
BEGIN
RAISE LOG 'PROFILING: testit() Step 1 : %', clock_timestamp();
x := 3;
RAISE LOG 'PROFILING: testit() Step 2 : %', clock_timestamp();
y := x * 2;
RAISE LOG 'PROFILING: testit() Step 3 : %', clock_timestamp();
RETURN;
END;
$$ LANGUAGE plpgsql;

As you can see I do log ‘PROFILING:' to be able to quickly grep the logs, name of the function (because I might be profiling more than 1 function at a time, and which step it was on. Plus a very precise timestamp.

After single run, I have in logs:

2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721795,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 1 : 2010-03-18 11:37:00.003352+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as
x;",,,"psql"
2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721796,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 2 : 2010-03-18 11:37:00.003367+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as
x;",,,"psql"
2010-03-18 11:37:00.003 CET,"depesz","depesz",7949,"[local]",4ba1fdd7.1f0d,7721797,"SELECT",2010-03-18 11:17:59 CET,2/38,0,LOG,00000,"PROFILING: testit() Step 3 : 2010-03-18 11:37:00.003383+01",,,,,,"SELECT count(*) FROM (SELECT testit() from generate_series(1,1000000)) as
x;",,,"psql"

Which can be simplified to only relevant information:

7949,"PROFILING: testit() Step 1 : 2010-03-18 11:37:00.003352+01"
7949,"PROFILING: testit() Step 2 : 2010-03-18 11:37:00.003367+01"
7949,"PROFILING: testit() Step 3 : 2010-03-18 11:37:00.003383+01"

First column is important, as it is pid of backend process, and we need it so that we will not mix profiling messages from various backends.

So. How much slower it is?

SELECT count(*) FROM (
SELECT testit() from generate_series(1,1000000)
) as x;
count
---------
1000000
(1 row)
 
Time: 50819.932 ms

That seems to be a lot in comparison with 3s previously. But is it? Let's thing: within those 51 seconds, for 3 second we were doing “normal stuff, not profiling related". So it leaves with 48 seconds of overhead. 48 seconds for 1 million calls, each call was 3 raise log command – this means that single raise log took 0.016ms!

Now. In our degenerated example – it's a lot. But our function doesn't do anything. Let's try it on something *a bit* more realistic. We'll test 2 functions:

CREATE OR REPLACE FUNCTION insert_word( in_word TEXT ) RETURNS void as $$
DECLARE
use_word_id INT4;
BEGIN
use_word_id := nextval('words_id_seq');
INSERT INTO words (id, word) VALUES (use_word_id, in_word);
LOOP
UPDATE word_counts SET count = count + 1 WHERE word = in_word;
IF found THEN
RETURN;
END IF;
BEGIN
INSERT INTO word_counts (word, count) VALUES (in_word, 1);
RETURN;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ language plpgsql;
 
CREATE OR REPLACE FUNCTION insert_word2( in_word TEXT ) RETURNS void as $$
DECLARE
use_word_id INT4;
BEGIN
INSERT INTO words (word) VALUES (in_word) returning id INTO use_word_id;
LOOP
UPDATE word_counts SET count = count + 1 WHERE word = in_word;
IF found THEN
RETURN;
END IF;
BEGIN
INSERT INTO word_counts (word, count) VALUES (in_word, 1);
RETURN;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ language plpgsql;

These functions relate to 2 tables that will need to be created as well:

CREATE TABLE words (
id serial PRIMARY KEY,
word TEXT NOT NULL
);
CREATE TABLE word_counts (
word TEXT PRIMARY KEY,
count INT4 NOT NULL DEFAULT 0
);

Now. Functionality of these functions is not really relevant. Let's just see how well do they behave without profiling, and then with profiling.

So, I created test table with 72900 words, and run this:

# SELECT count(*) FROM (
SELECT insert_word( x.y) FROM x
) as q;

Results?

  • Time: 4638.691 ms
  • Time: 4665.952 ms
  • Time: 4667.546 ms
  • Time: 4662.410 ms
  • Time: 4660.251 ms

OK. Now the second function:

# SELECT count(*) FROM (
SELECT insert_word2( x.y) FROM x
) as q;

And its results:

  • Time: 5023.776 ms
  • Time: 4819.380 ms
  • Time: 4656.849 ms
  • Time: 4631.946 ms
  • Time: 4613.677 ms

OK. Now, let's change the functions:

CREATE OR REPLACE FUNCTION insert_word( in_word TEXT ) RETURNS void as $$
DECLARE
use_word_id INT4;
BEGIN
RAISE LOG 'PROFILING: insert_word() Step 1 : %', clock_timestamp();
use_word_id := nextval('words_id_seq');
RAISE LOG 'PROFILING: insert_word() Step 2 : %', clock_timestamp();
INSERT INTO words (id, word) VALUES (use_word_id, in_word);
RAISE LOG 'PROFILING: insert_word() Step 3 : %', clock_timestamp();
LOOP
UPDATE word_counts SET count = count + 1 WHERE word = in_word;
IF found THEN
RAISE LOG 'PROFILING: insert_word() Step 4 : %', clock_timestamp();
RETURN;
END IF;
RAISE LOG 'PROFILING: insert_word() Step 5 : %', clock_timestamp();
BEGIN
INSERT INTO word_counts (word, count) VALUES (in_word, 1);
RAISE LOG 'PROFILING: insert_word() Step 6 : %', clock_timestamp();
RETURN;
EXCEPTION WHEN unique_violation THEN
RAISE LOG 'PROFILING: insert_word() Step 7 : %', clock_timestamp();
END;
END LOOP;
END;
$$ language plpgsql;
 
CREATE OR REPLACE FUNCTION insert_word2( in_word TEXT ) RETURNS void as $$
DECLARE
use_word_id INT4;
BEGIN
RAISE LOG 'PROFILING: insert_word2() Step 1 : %', clock_timestamp();
INSERT INTO words (word) VALUES (in_word) returning id INTO use_word_id;
RAISE LOG 'PROFILING: insert_word2() Step 2 : %', clock_timestamp();
LOOP
UPDATE word_counts SET count = count + 1 WHERE word = in_word;
IF found THEN
RAISE LOG 'PROFILING: insert_word2() Step 3 : %', clock_timestamp();
RETURN;
END IF;
RAISE LOG 'PROFILING: insert_word2() Step 4 : %', clock_timestamp();
BEGIN
INSERT INTO word_counts (word, count) VALUES (in_word, 1);
RAISE LOG 'PROFILING: insert_word2() Step 5 : %', clock_timestamp();
RETURN;
EXCEPTION WHEN unique_violation THEN
RAISE LOG 'PROFILING: insert_word2() Step 6 : %', clock_timestamp();
END;
END LOOP;
END;
$$ language plpgsql;

Now. I added 6 and 7 calls to profiling per function. Since each function is called 72900 times, it should make the calls 8 (insert_word) or 7 (insert_word2) seconds slower.

How does it look in reality?

insert_word() * 72900:

  • Time: 14767.158 ms
  • Time: 12476.824 ms
  • Time: 12517.576 ms
  • Time: 12237.033 ms
  • Time: 12181.778 ms

and insert_word2() * 72900:

  • Time: 11001.922 ms
  • Time: 11689.947 ms
  • Time: 11148.601 ms
  • Time: 15607.282 ms
  • Time: 17824.019 ms

Quite good – practically on spot with previous calculation. There were 2 problematic runs that took much longer time – 15 and 17 seconds instead of 11, but I think it was related to some side-generated load on my machine.

If this seem to much to you, think about it: Let's assume you have 150ms function. And you add to it 20 profiling lines. Total cost of profiling is 0.3ms! So of course – it is not usable for fast functions, that run in less than 0.1ms, but these do not need profiling anyway.

Generally, I think that 10 profiling steps should be enough for every case. And let's assume that profiling can use up to 3% of function time. This means that we can sensibly profile functions as fast as 5.3ms per call! Which means – that for vast majority of uses – this kind of profiling does not put any measurable load on server.

Now. We did some profiled calls to function. What can we do with it?

First of all – assuming you're logging to csvlog (which I believe is the best option), we'll just load it to PostgreSQL.

I created a simple program that connected to database, and ran all of these 72900 inserts, in parallel (10 words), in random order, choosing randomly which function to use.

This program generated ~ 750MB of logs. But hey – it ran over 700 000 queries, each logging at the very least 3 messages (number of messages logged depends which code path have been chosen).

So. Let's load the logs to Pg. According to docs we just need a simple table, and then we can “COPY" the data to it. Let's try.

some create table, and copy, and 10 minutes later …

# select count(*) from postgres_log ;
count
---------
3354488
(1 row)

Nice.

Now. Let's get some sensible data out of it.

Final query will be a bit complicated, so let's do it step by step. Also – since I have 3.3 million lines in log, when writing intermediary queries, I usually add “limit 100″ to it to get the check faster, but I'm showing query without limit to let you understand it better.

First we need to get only the lines that relate to our profiling:

SELECT
l.*
FROM
postgres_log l
WHERE
l.error_severity = 'LOG'
AND l.message like 'PROFILING: %'

Great. Now. Let's just get relevant information from it, and – split it into many columns so that it will be easier to group later:

WITH simplified_splitted AS (
SELECT
l.process_id,
regexp_split_to_array( l.message, E'\\s+' ) as message
FROM
postgres_log l
WHERE
l.error_severity = 'LOG'
AND l.message like 'PROFILING: %'
)
SELECT
process_id,
message[2] as function_name,
message[4] as step,
cast( message[6] || ' ' || message[7] as timestamptz ) as logged_when
FROM
simplified_splitted

Word of explanation – I used WITH, to do the regexp_split_to_array only once, and then in main query use already-splitted version of message.

Results of this query look like this:

process_id | function_name | step | logged_when
------------+----------------+------+-------------------------------
13219 | insert_word() | 1 | 2010-03-18 12:50:20.985081+01
13219 | insert_word() | 2 | 2010-03-18 12:50:20.985369+01
13219 | insert_word() | 3 | 2010-03-18 12:50:20.985939+01
13219 | insert_word() | 5 | 2010-03-18 12:50:20.986539+01
13219 | insert_word() | 6 | 2010-03-18 12:50:20.98677+01
13219 | insert_word2() | 1 | 2010-03-18 12:50:20.987871+01
13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988092+01
13226 | insert_word() | 1 | 2010-03-18 12:50:20.988143+01
13219 | insert_word2() | 4 | 2010-03-18 12:50:20.988335+01
13226 | insert_word() | 2 | 2010-03-18 12:50:20.988383+01

Brilliant. Now, we can add some window functions to calculate times.

Generally, we need to group the record (well, grouping not as SQL “GROUP BY" but just “treat them like similar"), within group order by logged_when, and then calculate differences between previous row and current. Seems simple?

Let's do it :) First, try to run it on first 20 rows:

WITH simplified_splitted AS (
SELECT
l.process_id,
regexp_split_to_array( l.message, E'\\s+' ) as message
FROM
postgres_log l
WHERE
l.error_severity = 'LOG'
AND l.message like 'PROFILING: %'
),
full_splitted AS (
SELECT
process_id,
message[2] as function_name,
message[4] as step,
cast( message[6] || ' ' || message[7] as timestamptz ) as logged_when
FROM
simplified_splitted
)
SELECT
f.*,
lag( step ) over ( profiling ) as previous_step,
lag( logged_when ) over ( profiling ) as previous_logged_when
FROM
full_splitted f
WINDOW profiling AS (
PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC
)
ORDER BY
f.process_id,
f.function_name,
f.logged_when

Results look like this:

process_id | function_name | step | logged_when | previous_step | previous_logged_when
------------+----------------+------+-------------------------------+---------------+-------------------------------
13219 | insert_word() | 1 | 2010-03-18 12:50:20.985081+01 | [null] | [null]
13219 | insert_word() | 2 | 2010-03-18 12:50:20.985369+01 | 1 | 2010-03-18 12:50:20.985081+01
13219 | insert_word() | 3 | 2010-03-18 12:50:20.985939+01 | 2 | 2010-03-18 12:50:20.985369+01
13219 | insert_word() | 5 | 2010-03-18 12:50:20.986539+01 | 3 | 2010-03-18 12:50:20.985939+01
13219 | insert_word() | 6 | 2010-03-18 12:50:20.98677+01 | 5 | 2010-03-18 12:50:20.986539+01
13219 | insert_word2() | 1 | 2010-03-18 12:50:20.987871+01 | [null] | [null]
13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988092+01 | 1 | 2010-03-18 12:50:20.987871+01
13219 | insert_word2() | 4 | 2010-03-18 12:50:20.988335+01 | 2 | 2010-03-18 12:50:20.988092+01
13219 | insert_word2() | 5 | 2010-03-18 12:50:20.988464+01 | 4 | 2010-03-18 12:50:20.988335+01
13219 | insert_word2() | 1 | 2010-03-18 12:50:20.988905+01 | 5 | 2010-03-18 12:50:20.988464+01
13219 | insert_word2() | 2 | 2010-03-18 12:50:20.988981+01 | 1 | 2010-03-18 12:50:20.988905+01
13219 | insert_word2() | 4 | 2010-03-18 12:50:20.989045+01 | 2 | 2010-03-18 12:50:20.988981+01
13219 | insert_word2() | 5 | 2010-03-18 12:50:20.989108+01 | 4 | 2010-03-18 12:50:20.989045+01
13219 | insert_word2() | 1 | 2010-03-18 12:50:20.989436+01 | 5 | 2010-03-18 12:50:20.989108+01
13219 | insert_word2() | 2 | 2010-03-18 12:50:20.98951+01 | 1 | 2010-03-18 12:50:20.989436+01
13226 | insert_word() | 1 | 2010-03-18 12:50:20.988143+01 | [null] | [null]
13226 | insert_word() | 2 | 2010-03-18 12:50:20.988383+01 | 1 | 2010-03-18 12:50:20.988143+01
13226 | insert_word() | 3 | 2010-03-18 12:50:20.988747+01 | 2 | 2010-03-18 12:50:20.988383+01
13226 | insert_word() | 5 | 2010-03-18 12:50:20.989298+01 | 3 | 2010-03-18 12:50:20.988747+01
13226 | insert_word() | 6 | 2010-03-18 12:50:20.989448+01 | 5 | 2010-03-18 12:50:20.989298+01

Nice. Couple of notes – if current “step" is 1 – there is no point in calculating time difference because previous step in this backend, from the same function will be from different call to this function.

So, we will need to filter out rows with “step = 1″ (but this has to be done in final query, not the ones that we have now in “WITH".

And besides this – let's just calculate time differences, and which step it really was:

WITH simplified_splitted AS (
SELECT
l.process_id,
regexp_split_to_array( l.message, E'\\s+' ) as message
FROM
postgres_log l
WHERE
l.error_severity = 'LOG'
AND l.message like 'PROFILING: %'
),
full_splitted AS (
SELECT
process_id,
message[2] as function_name,
message[4] as step,
cast( message[6] || ' ' || message[7] as timestamptz ) as logged_when
FROM
simplified_splitted
LIMIT 20
),
lagged_info AS (
SELECT
f.*,
lag( step ) over ( profiling ) as previous_step,
lag( logged_when ) over ( profiling ) as previous_logged_when
FROM
full_splitted f
WINDOW profiling AS (
PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC
)
)
SELECT
i.function_name,
i.previous_step || ' -> ' || i.step,
i.logged_when - i.previous_logged_when
FROM
lagged_info i
WHERE
i.step <> '1'

Results from some random 20 profiling rows look like this:

function_name | ?column? | ?column?
----------------+----------+-----------------
insert_word() | [null] | [null]
insert_word() | 2 -> 3 | 00:00:00.000071
insert_word() | 3 -> 4 | 00:00:00.000087
insert_word() | 1 -> 2 | 00:00:00.000042
insert_word() | 2 -> 3 | 00:00:00.000061
insert_word() | 3 -> 4 | 00:00:00.000067
insert_word2() | 1 -> 2 | 00:00:00.000073
insert_word2() | 2 -> 3 | 00:00:00.000108
insert_word2() | 1 -> 2 | 00:00:00.000075
insert_word2() | 2 -> 3 | 00:00:00.000204
insert_word() | 1 -> 2 | 00:00:00.000045
insert_word() | 2 -> 3 | 00:00:00.000056
insert_word() | 3 -> 4 | 00:00:00.000086
insert_word2() | 1 -> 2 | 00:00:00.000079
insert_word2() | 2 -> 3 | 00:00:00.000104
(15 rows)

So now, we can add full grouping, and present full statistics:

WITH simplified_splitted AS (
SELECT
l.process_id,
regexp_split_to_array( l.message, E'\\s+' ) as message
FROM
postgres_log l
WHERE
l.error_severity = 'LOG'
AND l.message like 'PROFILING: %'
),
full_splitted AS (
SELECT
process_id,
message[2] as function_name,
message[4] as step,
cast( message[6] || ' ' || message[7] as timestamptz ) as logged_when
FROM
simplified_splitted
),
lagged_info AS (
SELECT
f.*,
lag( step ) over ( profiling ) as previous_step,
lag( logged_when ) over ( profiling ) as previous_logged_when
FROM
full_splitted f
WINDOW profiling AS (
PARTITION BY f.process_id, f.function_name ORDER BY logged_when ASC
)
)
SELECT
i.function_name,
i.previous_step || ' -> ' || i.step as which_step,
sum( i.logged_when - i.previous_logged_when ) as total_time,
count(*) as total_count
FROM
lagged_info i
WHERE
i.step <> '1'
GROUP BY
function_name, which_step
ORDER BY
total_time DESC

Results (this time from all 3m rows) look like this:

function_name | which_step | total_time | total_count
----------------+------------+-----------------+-------------
insert_word2() | 2 -> 3 | 00:00:34.921153 | 327246
insert_word() | 3 -> 4 | 00:00:34.18836 | 328851
insert_word2() | 1 -> 2 | 00:00:30.443333 | 363517
insert_word() | 2 -> 3 | 00:00:24.551921 | 365483
insert_word() | 1 -> 2 | 00:00:18.136481 | 365483
insert_word2() | 4 -> 5 | 00:00:02.722998 | 36270
insert_word2() | 2 -> 4 | 00:00:02.699597 | 36271
insert_word() | 5 -> 6 | 00:00:02.697111 | 36630
insert_word() | 3 -> 5 | 00:00:02.601562 | 36632
insert_word() | 5 -> 7 | 00:00:00.00527 | 2
insert_word2() | 4 -> 6 | 00:00:00.000399 | 1
insert_word() | 7 -> 4 | 00:00:00.000154 | 2
insert_word2() | 6 -> 3 | 00:00:00.000103 | 1
(13 rows)

Looks pretty sweet to me. Of course results are obvious, and there is not really anything I can do to make it faster with these functions, but it shows that the method really works.

  1. 11 comments

  2. # Johan
    Mar 18, 2010

    Thank you for this (and all your articles)! It allowed me to check and optimize a slow function (and revealed some really stupid thinking on my part) from 12 seconds to less than one!

  3. # Brett
    Mar 18, 2010

    The edb-debugger module on pgfoundry also contains an excellent plpgsql profiling tool, which allows you to gather similar data (plus some). It’s less invasive (no code changes required) but you need to explicitly turn it on/off in each session to be traced.
    I’m using it a bit to discover potential improvements in my plpgsql. By day I’m an Oracle developer, but I’ve found the postgres offering to be easier to use and more useful than Oracle’s dbms_profiler.

  4. # ash
    Mar 20, 2010

    ever tried track_functions in postgresql 8.4?

  5. Mar 20, 2010

    @ASH:
    sure. not sure though how’s that relevant. track_functions tracks how many times function was called. This blogpost is about checking how much time parts of function take.

  6. Apr 1, 2010

    Depesz WTF, this blog layout is awful. Bring back the old one.

  7. Apr 1, 2010

    @TOM:
    no way. 2 people complained about new layout (you and metys), and all other opinions i got were: “YAY, it’s so much better than previous”.

  8. # Matthew
    Apr 18, 2010

    @depesz make that 3. At least people could choose the layout before.

  9. Apr 18, 2010

    @Matthew:
    heh, perhaps I should add a poll – “Do you like new layout” :)

    But since, you made it to 3 – theme chooser is back :)

  10. # Matthew
    Apr 18, 2010

    Yay! “AS 400″ selected here.

  11. Sep 24, 2011

    That windowing stuff is a mind bender. How about this instead?

    mark := clock_timestamp();
    – stuff
    RAISE LOG ‘Step 1 : %’, clock_timestamp() – mark;

  12. Sep 25, 2011

    @Brian:
    Sure, it’s possible, but then you’d have to add another mark := clock_timestamp(); *after* each raise log.
    Plus – you have to do more stuff on runtime (more calls to clock_timestamp, substraction), which I wanted to avoid to make the overhead as small as possible.

Leave a comment