Waiting for PostgreSQL 15 – Introduce log_destination=jsonlog

On 17th of January 2022, Michael Paquier committed patch:

Introduce log_destination=jsonlog
 
"jsonlog" is a new value that can be added to log_destination to provide
logs in the JSON format, with its output written to a file, making it
the third type of destination of this kind, after "stderr" and
"csvlog".  The format is convenient to feed logs to other applications.
There is also a plugin external to core that provided this feature using
the hook in elog.c, but this had to overwrite the output of "stderr" to
work, so being able to do both at the same time was not possible.  The
files generated by this log format are suffixed with ".json", and use
the same rotation policies as the other two formats depending on the
backend configuration.
 
This takes advantage of the refactoring work done previously in ac7c807,
bed6ed3, 8b76f89 and 2d77d83 for the backend parts, and 72b76f7 for the
TAP tests, making the addition of any new file-based format rather
straight-forward.
 
The documentation is updated to list all the keys and the values that
can exist in this new format.  pg_current_logfile() also required a
refresh for the new option.
 
Author: Sehrope Sarkuni, Michael Paquier
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com

This is VERY, VERY cool.

I don't like usage of JSON in database. In my experience in IRC and Slack, ~ 95% of cases are abuses of JSON for things that should be rewritten.

BUT – outside of DB, for things like integration data, it is actually pretty good.

We already have csvlog, but apparently not all csv parsing libraries understand the same things as CSV (multi-line values!).

So, I'm very enthusiastic about *NOT* having to find specialized tool that handles csvs in the same way that PostgreSQL makes them (which is, obviously, the correct way :).

Let's see how that works.

I changed (in config file) just two parameters:

Other log-related settings are:

$ SELECT name, setting FROM pg_settings  WHERE name  ~ '^log' ORDER BY 1;
               name                │            setting             
───────────────────────────────────┼────────────────────────────────
 log_autovacuum_min_duration       │ 600000
 log_checkpoints                   │ ON
 log_connections                   │ ON
 log_destination                   │ jsonlog
 log_directory                     │ log
 log_disconnections                │ ON
 log_duration                      │ off
 log_error_verbosity               │ DEFAULT
 log_executor_stats                │ off
 log_file_mode                     │ 0600
 log_filename                      │ postgresql-%Y-%m-%d_%H%M%S.log
 logging_collector                 │ ON
 log_hostname                      │ off
 logical_decoding_work_mem         │ 65536
 log_line_prefix                   │ %m %u@%d %p %r 
 log_lock_waits                    │ ON
 log_min_duration_sample           │ -1
 log_min_duration_statement        │ 0
 log_min_error_statement           │ error
 log_min_messages                  │ warning
 log_parameter_max_length          │ -1
 log_parameter_max_length_on_error │ 0
 log_parser_stats                  │ off
 log_planner_stats                 │ off
 log_recovery_conflict_waits       │ off
 log_replication_commands          │ off
 log_rotation_age                  │ 1440
 log_rotation_size                 │ 102400
 log_startup_progress_interval     │ 10000
 log_statement                     │ NONE
 log_statement_sample_rate         │ 1
 log_statement_stats               │ off
 log_temp_files                    │ 0
 log_timezone                      │ Europe/Warsaw
 log_transaction_sample_rate       │ 0
 log_truncate_on_rotation          │ off
(36 ROWS)

After restart I noticed in log/ directory that I have new file named *.json:

=$ ls -l log/
total 192
-rw------- 1 pgdba pgdba 178194 Jan 17 12:23 postgresql-2022-01-17_121022.log
-rw------- 1 pgdba pgdba  11240 Jan 17 12:26 postgresql-2022-01-17_122308.json
-rw------- 1 pgdba pgdba    175 Jan 17 12:23 postgresql-2022-01-17_122308.log

It's content it not really easy to read directly:

=$ head -n 5 log/postgresql-2022-01-17_122308.json 
{"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":1,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"ending log output to stderr","hint":"Future log output will go to log destination \"jsonlog\".","backend_type":"postmaster","query_id":0}
{"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":2,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"starting PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, 64-bit","backend_type":"postmaster","query_id":0}
{"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":3,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on IPv4 address \"0.0.0.0\", port 5430","backend_type":"postmaster","query_id":0}
{"timestamp":"2022-01-17 12:23:08.437 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":4,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on IPv6 address \"::\", port 5430","backend_type":"postmaster","query_id":0}
{"timestamp":"2022-01-17 12:23:08.444 CET","pid":1560406,"session_id":"61e5519c.17cf56","line_num":5,"session_start":"2022-01-17 12:23:08 CET","txid":0,"error_severity":"LOG","message":"listening on Unix socket \"/tmp/.s.PGSQL.5430\"","backend_type":"postmaster","query_id":0}

But luckily we have tools like jq:

=$ tail -n2 log/postgresql-2022-01-17_122308.json | jq .
{
  "timestamp": "2022-01-17 12:28:08.549 CET",
  "pid": 1560408,
  "session_id": "61e5519c.17cf58",
  "line_num": 1,
  "session_start": "2022-01-17 12:23:08 CET",
  "txid": 0,
  "error_severity": "LOG",
  "message": "checkpoint starting: time",
  "backend_type": "checkpointer",
  "query_id": 0
}
{
  "timestamp": "2022-01-17 12:28:08.574 CET",
  "pid": 1560408,
  "session_id": "61e5519c.17cf58",
  "line_num": 2,
  "session_start": "2022-01-17 12:23:08 CET",
  "txid": 0,
  "error_severity": "LOG",
  "message": "checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.007 s, sync=0.002 s, total=0.025 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB",
  "backend_type": "checkpointer",
  "query_id": 0
}

My immediate question is: how does it look in case of multi-line queries?

=$ printf 'select\n    relname,\n    relkind\nfrom\n    pg_class\nlimit 1;\n' | psql -aX
SELECT
    relname,
    relkind
FROM
    pg_class
LIMIT 1;
   relname    | relkind 
--------------+---------
 pg_statistic | r
(1 ROW)

And it's data in log looks like:

=$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1
{"timestamp":"2022-01-17 12:30:27.324 CET","user":"pgdba","dbname":"pgdba","pid":1563548,"remote_host":"[local]","session_id":"61e55353.17db9c","line_num":3,"ps":"SELECT","session_start":"2022-01-17 12:30:27 CET","vxid":"4/0","txid":0,"error_severity":"LOG","message":"duration: 0.693 ms  statement: select\n    relname,\n    relkind\nfrom\n    pg_class\nlimit 1;","application_name":"psql","backend_type":"client backend","query_id":6991982717157019333}
 
=$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1 | jq .
{
  "timestamp": "2022-01-17 12:30:27.324 CET",
  "user": "pgdba",
  "dbname": "pgdba",
  "pid": 1563548,
  "remote_host": "[local]",
  "session_id": "61e55353.17db9c",
  "line_num": 3,
  "ps": "SELECT",
  "session_start": "2022-01-17 12:30:27 CET",
  "vxid": "4/0",
  "txid": 0,
  "error_severity": "LOG",
  "message": "duration: 0.693 ms  statement: select\n    relname,\n    relkind\nfrom\n    pg_class\nlimit 1;",
  "application_name": "psql",
  "backend_type": "client backend",
  "query_id": 6991982717157020000
}
 
=$ grep relkind log/postgresql-2022-01-17_122308.json | tail -n1 | jq -r .message
duration: 0.693 ms  statement: select
    relname,
    relkind
from
    pg_class
limit 1;

Sweet. Looks like each query will be in single line, regardless of how many lines it was, and that the standard tool (jq) handles it perfectly.

We can also use jq to find stuff in logs:

=$ jq 'select(.error_severity != "LOG" )' log/postgresql-2022-01-17_122308.json
{
  "timestamp": "2022-01-17 12:41:48.112 CET",
  "user": "depesz",
  "dbname": "depesz",
  "pid": 1561647,
  "remote_host": "[local]",
  "session_id": "61e55238.17d42f",
  "line_num": 10,
  "ps": "SELECT",
  "session_start": "2022-01-17 12:25:44 CET",
  "vxid": "3/29",
  "txid": 0,
  "error_severity": "ERROR",
  "state_code": "22012",
  "message": "division by zero",
  "statement": "select 1/0;",
  "application_name": "psql",
  "backend_type": "client backend",
  "query_id": -6975083880902025000
}

This is awesome. Huge thanks to all involved.

2 thoughts on “Waiting for PostgreSQL 15 – Introduce log_destination=jsonlog”

  1. As for integration with other tools, it looks great. But when it comes to troubleshooting, it seems to hinder people from reading directly.

  2. CSV format is defined in RFC, so csv parsing libraries without multiline values support are just broken by design

Comments are closed.