Some new tools for PostgreSQL or around PostgreSQL

During last months I wrote some tools to help me with my daily duties.

I'd like to let you know you about them, as you might find them useful.

So, here we go:

pg.logs.tail

Available from OmniTI SVN.

It's a smarter “tail -f" for PostgreSQL logs. Smarter in a way, that it knows that PostgreSQL rotates its log file, so if it will see that the rotation happened, it will switch to next file.

To run the program you just do:

pg.logs.tail -d /var/log/postgresql

Or whenever you have the logs in.

If you are in logs directory, just run pg.logs.tail without options, and it should work.

Aside from -d option, there are only 2 more:

  • -v – prints (to STDERR not to mess with data when grepping) whenever it switches to new file
  • -f – regexp that is used to find log files – it defaults to sane postgresql-.*\\.log so generally you don't need to change it

pg.grep

Available from OmniTI SVN.

This is grep. But (also) smarter.

For starters – it understands that given “message" in log can span over multiple lines.

Also – it understands that some log messages cause other messages to be shown – for example, when you have exception/error, usually there is “STATEMENT:" log line, which is connected to the error before it.

To do it's “parsing" pg.grep needs to know what is the log_line_prefix that you're using in Pg – otherwise it wouldn't be able to tell whether given line in log is continuation of message from previous line, or a new message altogether.

So, let's see how it works. My log_line_prefix is “%m %u@%d %p %r “, and I just ran query:

$ SELECT 1
>> +
>> 2
>> -
>> 123;
 ?COLUMN?
──────────
     -120
(1 ROW)

In log, I have:

2012-01-22 17:27:54.929 CET [UNKNOWN]@[UNKNOWN] 17743  LOG:  connection received: host=[LOCAL]
2012-01-22 17:27:54.930 CET depesz@depesz 17743 [LOCAL] LOG:  connection authorized: USER=depesz DATABASE=depesz
2012-01-22 17:27:57.447 CET depesz@depesz 17743 [LOCAL] LOG:  duration: 0.603 ms  statement: SELECT 1
        +
        2
        -
        123;
2012-01-22 17:28:04.924 CET depesz@depesz 17743 [LOCAL] LOG:  disconnection: SESSION TIME: 0:00:09.995 USER=depesz DATABASE=depesz host=[LOCAL]
2012-01-22 17:28:05.225 CET [UNKNOWN]@[UNKNOWN] 17778  LOG:  connection received: host=[LOCAL]
2012-01-22 17:28:05.226 CET depesz@depesz 17778 [LOCAL] LOG:  connection authorized: USER=depesz DATABASE=depesz

If I'd just grep it for “123" I would get:

=$ tail postgresql-2012-01-22_164637.log | grep 123
        123;

not so helpful. But with pg.grep:

=$ tail postgresql-2012-01-22_164637.log | ~/pg.grep -p "%m %u@%d %p %r " -m 123
2012-01-22 17:27:57.447 CET depesz@depesz 17743 [LOCAL] LOG:  duration: 0.603 ms  statement: SELECT 1
        +
        2
        -
        123;

Now, let's consider another example:

$ SELECT 1 +
>> version();
ERROR:  operator does NOT exist: INTEGER + text
LINE 1: SELECT 1 +
                 ^
HINT:  No operator matches the given name AND argument TYPE(s). You might need TO ADD explicit TYPE casts.

normal grep for ERRORS would show:

=$ cat postgresql-2012-01-22_173247.log | grep ERROR
2012-01-22 17:32:53.188 CET depesz@depesz 17778 [local] ERROR:  operator does not exist: integer + text at character 10

but pg.grep for errors shows:

=$ cat postgresql-2012-01-22_173247.log | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR
2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] ERROR:  operator does NOT exist: INTEGER + text at CHARACTER 10
2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] HINT:  No operator matches the given name AND argument TYPE(s). You might need TO ADD explicit TYPE casts.
2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] STATEMENT:  SELECT 1 +
        version();

In case you'd really need just the ERROR line, you can do so by using -l (–line) option:

=$ cat postgresql-2012-01-22_173247.log | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR -l
2012-01-22 17:32:53.188 CET depesz@depesz 17778 [LOCAL] ERROR:  operator does NOT exist: INTEGER + text at CHARACTER 10

Aside from -p (prefix), -l (line) and -m (match) pg.grep supports also:

  • -s – skip – reverse match. -s “operator does not exist" would skip messages containing this error
  • -M – case sensitive -m
  • -S – case sensitive -s

Of course you can use many -m/-s/-M/-S, for example like this:

=$ cat postgresql-2012-01-22_1* | ~/pg.grep -p "%m %u@%d %p %r " -m ERROR:.*syntax -l -s integer
2012-01-22 17:32:37.787 CET depesz@depesz 17778 [local] ERROR:  syntax error at or near ";" at character 11

Which would show all ERRORS related to syntax, except the ones that are related to integers (like: invalid input syntax for integer: “asbc" at character 12)

fast dump and restore

Available from OmniTI SVN.

This are tools that can be used to make fast, but somehow unsafe dump of database, and restore data from the dump.

The general idea was presented by me some time ago, and used, with success, for one of clients.

The scripts that were written then, got generalized, rewritten, and are now available for you.

The “unsafe" part is very simple – to make the dump fast, the script does the work in parallel. By working in parallel it has multiple transactions open, and as such, it cannot have any guarantees (from PostgreSQL) about consistency of the data between the transactions.

So – you can easily use it, as long as there are no other (than fast.dump) connections to database – that is writing connections. As long as other connections are only reading – you're fine. But any write to database can, and most likely will, bite you when you'll work with restore.

To use the tools, just run them with –help. And always first try on some test/devel machine.

system_monitoring

Available from OmniTI SVN.

This tool is not strictly for PostgreSQL, but I use it on virtually any Pg server I have to look after, and I found it very useful.

You most likely all know tools like Cacti, Ganglia, Munin and so on.

system.monitoring is similar in the way that it gets data from system. But it's totally different in a way that it doesn't make any graphs. It just stores the data.

But, addition of new data to be stored is trivial.

Most generally system_monitoring.pl is simply job scheduler – like cron. But, unlike cron, it is not limited to 1 minute as the shortest interval. It can easily run jobs every second. Or run jobs that never end.

It will run every command you give it to run, and log its output. That's all. But since it's small, and the logged format is normal plain text, it became extremely useful for me.

Let's take a look at sample config.

At the beginning I set some variables – most importantly – where the logs will go.

Then I set environment variables that will be used for all “checks" (it can be overridden for every check). And then I set some additional variables which make running checks a bit nicer.

Then, I have the actual checks.

There are three kinds of them: persistent (command), periodic command, and periodic file.

persistent check example looks like this:

check.iostat.type=persistent
check.iostat.exec=iostat -kx 5

On start, system_monitoring.pl will run iostat -kx 5, and then will log everything it will output, when it will output, to the iostat logfile.

Periodic check can looks like this:

check.ps.type=periodic
check.ps.exec=ps axo user,pid,ppid,pgrp,%cpu,%mem,rss,lstart,nice,nlwp,sgi_p,cputime,tty,wchan:25,args
check.ps.interval=30

This will run, every 30 seconds, this ps command, and log output.

If first character of exec for check is “<" the rest of exec line is assumed to be file that should be opened (to read), and it's content be copied to log for given check. Hence:

check.mem.type=periodic
check.mem.exec=< /proc/meminfo
check.mem.interval=10

Will log meminfo every 10 seconds to logs.

Logs are stored in tree structure which is based on when even happened. Exact filename for check is:

LOGDIR/YYYY/MM/DD/CHECK_NAME-HOUR.log

So, with default logdir, current meminfo log is:

/var/log/system_monitoring/2012/01/22/mem-2012-01-22-16.log

It looks like this:

2012-01-22 16:00:02 UTC BSGMA   MemTotal:     528655576 kB
2012-01-22 16:00:02 UTC BSGMA   MemFree:      14354712 kB
2012-01-22 16:00:02 UTC BSGMA   Buffers:       1099748 kB
2012-01-22 16:00:02 UTC BSGMA   Cached:       465002160 kB
2012-01-22 16:00:02 UTC BSGMA   SwapCached:       5304 kB
2012-01-22 16:00:02 UTC BSGMA   Active:       426886732 kB
2012-01-22 16:00:02 UTC BSGMA   Inactive:     55076184 kB
2012-01-22 16:00:02 UTC BSGMA   HighTotal:           0 kB
2012-01-22 16:00:02 UTC BSGMA   HighFree:            0 kB
2012-01-22 16:00:02 UTC BSGMA   LowTotal:     528655576 kB
2012-01-22 16:00:02 UTC BSGMA   LowFree:      14354712 kB
2012-01-22 16:00:02 UTC BSGMA   SwapTotal:     2048276 kB
2012-01-22 16:00:02 UTC BSGMA   SwapFree:      1902552 kB
2012-01-22 16:00:02 UTC BSGMA   Dirty:            5776 kB
2012-01-22 16:00:02 UTC BSGMA   Writeback:         236 kB
2012-01-22 16:00:02 UTC BSGMA   AnonPages:    15869780 kB
2012-01-22 16:00:02 UTC BSGMA   Mapped:       107046492 kB
2012-01-22 16:00:02 UTC BSGMA   Slab:         10872504 kB
2012-01-22 16:00:02 UTC BSGMA   PageTables:   20694796 kB
2012-01-22 16:00:02 UTC BSGMA   NFS_Unstable:        0 kB
2012-01-22 16:00:02 UTC BSGMA   Bounce:              0 kB
2012-01-22 16:00:02 UTC BSGMA   CommitLimit:  266376064 kB
2012-01-22 16:00:02 UTC BSGMA   Committed_AS: 158592164 kB
2012-01-22 16:00:02 UTC BSGMA   VmallocTotal: 34359738367 kB
2012-01-22 16:00:02 UTC BSGMA   VmallocUsed:    271152 kB
2012-01-22 16:00:02 UTC BSGMA   VmallocChunk: 34359466935 kB
2012-01-22 16:00:02 UTC BSGMA   HugePages_Total:     0
2012-01-22 16:00:02 UTC BSGMA   HugePages_Free:      0
2012-01-22 16:00:02 UTC BSGMA   HugePages_Rsvd:      0
2012-01-22 16:00:02 UTC BSGMA   Hugepagesize:     2048 kB
2012-01-22 16:00:12 UTC BSGPA   MemTotal:     528655576 kB

And so on.

As you can see nothing really complicated.

There are 2 more features. One of them are checks that are configured so that their output is ignored and not logged – this is for cleanup jobs – check last three in example config.

The other feature are headers.

For example, let's look at pg_stat_database check. The way it's written, it logs data like:

$ tail pg_stat_database-2012-01-22-15.log
2012-01-22 15:59:06 UTC BSF9w   template1       10      6       t       t       -1      11510   15545653        1663    \N      {=c/postgres,postgres=CTc/postgres}     4689516
2012-01-22 15:59:06 UTC BSF9w   postgres        10      6       f       t       -1      11510   15253018        1663    \N      \N      42800983660
2012-01-22 15:59:06 UTC BSF9w   pg_audit        10      6       f       t       -1      11510   15252690        1663    \N      \N      4779628
2012-01-22 15:59:06 UTC BSF9w   template0       10      6       t       f       -1      11510   3886088170      1663    \N      {=c/postgres,postgres=CTc/postgres}     4583020
2012-01-22 15:59:06 UTC BSF9w   magicdb 10      6       f       t       -1      11510   3922373574      1663    {"search_path=public, check_postgres, ltree, pgcrypto"} \N      716819345004
2012-01-22 15:59:37 UTC BSGFA   template1       10      6       t       t       -1      11510   15545653        1663    \N      {=c/postgres,postgres=CTc/postgres}     4689516
2012-01-22 15:59:37 UTC BSGFA   postgres        10      6       f       t       -1      11510   15253018        1663    \N      \N      42800983660
2012-01-22 15:59:37 UTC BSGFA   pg_audit        10      6       f       t       -1      11510   15252690        1663    \N      \N      4779628
2012-01-22 15:59:37 UTC BSGFA   template0       10      6       t       f       -1      11510   3886088170      1663    \N      {=c/postgres,postgres=CTc/postgres}     4583020
2012-01-22 15:59:37 UTC BSGFA   magicdb 10      6       f       t       -1      11510   3922373574      1663    {"search_path=public, check_postgres, ltree, pgcrypto"} \N      716820033132

which is nice, but it would be cool to be able to know what which columns mean.

So we have “check.pg_stat_database.header" line, which makes system_montoring, to log, whenever it rotates logfile (i.e. every hour) to first log header:

$ head -n 1 pg_stat_database-2012-01-22-15.log
2012-01-22 15:00:01 UTC :h   datname datdba  encoding        datistemplate   datallowconn    datconnlimit    datlastsysoid   datfrozenxid    dattablespace   datconfig       datacl  size

And that's about it. The one additional feature is that you can run system_monitoring and request it to show you data for given check for given time (or closest possible time after):

=$ ./system_monitoring.pl -s loadavg -t '2012-01-22 18:04:00' config-sample.cfg
2012-01-22 18:04:00 CET Kg      0.01 0.10 0.12 1/575 20320

And that's all. But thanks to its simplicity it's trivial to use the data it logs for some graphing tool, or just read it.

And that would be all for now. If you'll use any of these, have fun – they are free to use, free to modify. Sending me back patches is a nice touch, but I don't require it.

15 thoughts on “Some new tools for PostgreSQL or around PostgreSQL”

  1. If you’re on a system with the GNU version of tail, try:

    tail -F /var/log/whatever

    No need to write a new version of tail.

  2. @yDNA:
    Sure, I know about tail -F. But this only helps when file gets truncated and/or renamed.
    But Pg changes name of logfile. I.e. it can switch from postgresql-2012-01-20.log to postgresql-2012-01-21.log Or anything else – based of “log_filename” GUC.

  3. I actually use runit to start PostgreSQL and have everything log to stdout. This eliminates the need for PostgreSQL log rotation. This is another way to do it and keep PostgreSQL’s environment clean.

    #postgresql/run
    #!/bin/sh —
    exec 2>&1
    exec chpst -u postgres /usr/local/bin/postgres -D /pgdata/data

    #postgresl/log/run
    #!/bin/sh —
    exec 2>&1
    exec chpst -ulog svlogd ./main

    tail -F postgresql/log/main/current

  4. @Sean:
    I wouldn’t really use unrotated logs – having hundreds of gigabytes in single file gets tedious pretty fast.

  5. @despez:

    svlogd continuously reads log data from its standard input, optionally filters log messages, and writes the data to one or more automatically rotated logs.

    http://smarden.org/runit/svlogd.8.html

    The current log file is always named current, so tail -F works perfectly as @Sean described.

  6. Have the tools been moved or is there a problem with the web server?

  7. I found a minor problem with pg.grep. It was showing me everything in my log files. After further investigation, I realized it was due to pg.grep not knowing what to do with %a in our log_line_prefix.

    I used the following to the end of the %re hash in the prefix_to_re subroutine (YMMV):

    ‘a’ => ‘(?:[a-z0-9_]*|\[unknown\])’,

  8. Don’t try copy and paste… The comment form converted all of my single ticks into MS Word style single quotes.

  9. @Matthew:
    Committed new version, which has support for (almost) all % sequences that might be in log_line_prefix. Made the %a differently, but it should work fine, please test.

  10. Does pg.grep work on CSV format database log? I tried but it seems it can not:

    cat postgresql.log | pg.grep -p “%m,%u,%d,%p,%r,%c,%l,%i,%s,%v,%x”

  11. @Pius:
    no, it doesn’t. You can grep these at ease with normal grep. pg.grep “cool” thing is that it allows you to grep log lines even if they contain multiple “real” lines. csv logs don’t have that problem, afair.

  12. Hi Depsez,

    Too bad :(, multiple lines still exists in csv log:

    2014-06-09 16:24:24.878 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,2,”authentication”,2014-06-09 16:24:24 PDT,2/71947,0,LOG,00000,”connection authorized: user=postgres database=postgres”,,,,,,,,,””
    2014-06-09 16:24:25.004 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,3,”idle”,2014-06-09 16:24:24 PDT,2/71948,0,LOG,00000,”statement: SELECT d.datname as “”Name””,
    pg_catalog.pg_get_userbyid(d.datdba) as “”Owner””,
    pg_catalog.pg_encoding_to_char(d.encoding) as “”Encoding””,
    d.datcollate as “”Collate””,
    d.datctype as “”Ctype””,
    pg_catalog.array_to_string(d.datacl, E’\n’) AS “”Access privileges””
    FROM pg_catalog.pg_database d
    ORDER BY 1;”,,,,,,,,,”psql.bin”
    2014-06-09 16:24:25.284 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,4,”SELECT”,2014-06-09 16:24:24 PDT,2/0,0,LOG,00000,”duration: 314.140 ms”,,,,,,,,,”psql.bin”
    2014-06-09 16:24:25.285 PDT,”postgres”,”postgres”,9119,”[local]”,53964228.239f,5,”idle”,2014-06-09 16:24:24 PDT,,0,LOG,00000,”disconnection: session time: 0:00:00.462 user=postgres database=postgres host=[local]”,,,,,,,,,”psql.bin”

Comments are closed.