How to install and configure PostgreSQL Debian/Ubuntu – for developer use – part 2

In previous post I showed how I'd install PostgreSQL for developer.

But that's not all. Now we need to add some configuration. What, how, where?

First let's talk about where can we put configuration.

Initially the place for PostgreSQL config is postgresql.conf, which is in /etc/postgresql/13/main/postgresql.conf.

But, I don't actually like to put my changes there. The reason is very simple – with many editions to this file, and potential upgrades of PostgreSQL later on, it will be not trivial to know which config change came from PostgreSQL package, and which from me.

This leads to two options:

  • conf.d directory, which can contain many files
  • configuration using ALTER SYSTEM sql queries, which in turn stores new configs in /var/lib/postgresql/13/main/postgresql.auto.conf file.

Both of these approaches have their benefits. Conf.d – being in /etc/postgresql – clearly states that the files are configuration (/etc is supposed to be for configuration). On the other hand – postgresql.auto.conf is more closely tied to PostgreSQL data, as it resides within data directory.

Which think you'll choose – I'll let it for you to decide, but to show how to do it, I'll first do some changes using both approaches.

As you perhaps remember, Pg logs go to /var/log/postgresql/postgresql-13-main.log file. We can look into it, and we'll see things like:

2020-12-27 10:37:22.423 CET [1086] LOG:  starting PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2020-12-27 10:37:22.424 CET [1086] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2020-12-27 10:37:22.428 CET [1086] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-12-27 10:37:22.439 CET [1096] LOG:  database system was shut down at 2020-12-24 10:11:51 CET
2020-12-27 10:37:22.449 CET [1086] LOG:  database system is ready to accept connections
2020-12-27 14:20:49.480 CET [77596] postgres@depesz ERROR:  division by zero
2020-12-27 14:20:49.480 CET [77596] postgres@depesz STATEMENT:  select 1/0;

That's not much. Let's improve logging.

Since this is development machine, I think that we should log all queries, with their times, and as much other information as we can.

Based on information from one of my previous blogposts, I'd like to use these settings:

to make this happen I wrote file /etc/postgresql/13/main/conf.d/logging.conf with this content:

log_autovacuum_min_duration = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_min_duration_statement = 0
log_temp_files = 0

And then reloaded configuration using SQL query:

SELECT pg_reload_conf();

In logs I saw:

2020-12-27 14:35:45.266 CET [1086] LOG:  received SIGHUP, reloading configuration files
2020-12-27 14:35:45.268 CET [1086] LOG:  parameter "log_autovacuum_min_duration" changed to "0"
2020-12-27 14:35:45.268 CET [1086] LOG:  parameter "log_checkpoints" changed to "on"
2020-12-27 14:35:45.268 CET [1086] LOG:  parameter "log_connections" changed to "on"
2020-12-27 14:35:45.269 CET [1086] LOG:  parameter "log_disconnections" changed to "on"
2020-12-27 14:35:45.269 CET [1086] LOG:  parameter "log_lock_waits" changed to "on"
2020-12-27 14:35:45.269 CET [1086] LOG:  parameter "log_min_duration_statement" changed to "0"
2020-12-27 14:35:45.269 CET [1086] LOG:  parameter "log_temp_files" changed to "0"

And from this moment on, I will have much better grasp on what's going on. For example, If I'd run:

psql -c "select * from pg_stat_activity"

logs will contain:

2020-12-27 14:37:10.059 CET [84151] [unknown]@[unknown] LOG:  connection received: host=[local]
2020-12-27 14:37:10.060 CET [84151] depesz@depesz LOG:  connection authorized: user=depesz database=depesz application_name=psql
2020-12-27 14:37:10.069 CET [84151] depesz@depesz LOG:  duration: 8.247 ms  statement: select * from pg_stat_activity
2020-12-27 14:37:10.071 CET [84151] depesz@depesz LOG:  disconnection: session time: 0:00:00.011 user=depesz database=depesz host=[local]

Which tells me quite a lot about what has happened.

While it might not be common in development, it is possible that I'll want to use replication. It could happen for many reasons – taking backups, working with logical replication or decoding.

To make it work, and have other options available, for future, without more restarts, I'll need to set three parameters:

  • wal_level = logical : so wal will have all the information that is needed to setup any kind of replication, including logical
  • archive_mode = on : so I can archive wal files without any more restarts (this parameter can be only set at server start)
  • archive_command = /bin/true : since I don't need archiving happening now, let's make archive command don't archive anything, but always succeed.

To set it, I'll use SQL this time:

=$ psql -U postgres
...
$ ALTER system SET wal_level = logical;
$ ALTER system SET archive_mode = ON;
$ ALTER system SET archive_command = '/bin/true';

Afterwards, I can reload config:

$ SELECT pg_reload_conf();
 pg_reload_conf 
────────────────
 t
(1 ROW)

This looks promising, but quick look in Pg logs shows:

2020-12-29 14:03:17.566 CET [83396] postgres@postgres LOG:  duration: 0.335 ms  statement: select pg_reload_conf();
2020-12-29 14:03:17.566 CET [46224] LOG:  received SIGHUP, reloading configuration files
2020-12-29 14:03:17.567 CET [46224] LOG:  parameter "wal_level" cannot be changed without restarting the server
2020-12-29 14:03:17.567 CET [46224] LOG:  parameter "archive_mode" cannot be changed without restarting the server
2020-12-29 14:03:17.567 CET [46224] LOG:  parameter "archive_command" changed to "/bin/true"
2020-12-29 14:03:17.567 CET [46224] LOG:  configuration file "/var/lib/postgresql/13/main/postgresql.auto.conf" contains errors; unaffected changes were applied

Two of the parameters require Pg restart.

We can also see this using SQL:

$ SELECT * FROM pg_settings WHERE pending_restart \gx
-[ RECORD 1 ]---+-----------------------------------------------------
name            | archive_mode
setting         | off
unit            | 
category        | Write-Ahead Log / Archiving
short_desc      | Allows archiving OF WAL files USING archive_command.
extra_desc      | 
context         | postmaster
vartype         | enum
SOURCE          | DEFAULT
min_val         | 
max_val         | 
enumvals        | {always,ON,off}
boot_val        | off
reset_val       | off
sourcefile      | 
sourceline      | 
pending_restart | t
-[ RECORD 2 ]---+-----------------------------------------------------
name            | wal_level
setting         | replica
unit            | 
category        | Write-Ahead Log / Settings
short_desc      | SET the level OF information written TO the WAL.
extra_desc      | 
context         | postmaster
vartype         | enum
SOURCE          | DEFAULT
min_val         | 
max_val         | 
enumvals        | {minimal,replica,logical}
boot_val        | replica
reset_val       | replica
sourcefile      | 
sourceline      | 
pending_restart | t

in here we see that there are two parameters that have value change that will happen on restart.

So, let's restart Pg:

=$ sudo systemctl restart postgresql@13-main.service

Next thing on the table is configuring certain useful extensions.

I generally keep two of them enabled in all databases:

Of course, given that I log every query with its runtime, I don't technically need pg_stat_statements, but it's good to get acquainted with it, as in production logging every query is unlikely to be possible.

So, let's enable it all:

$ ALTER system SET shared_preload_libraries = pg_stat_statements, auto_explain;

This will require another restart, and after restart some more configuration:

=$ psql -U postgres
$ ALTER system SET auto_explain.log_min_duration = 250;
$ CREATE extension pg_stat_statements ;
$ SELECT pg_reload_conf();

config reload is so that auto_explain.log_min_duration will get set.

And the extension, created in postgres database will make pg_stat_statements views available for querying from within postgres db. But they will contain data about queries in all databases.

Let's see it now:

=$ psql -U depesz
$ SELECT COUNT(*) FROM pg_class;
$ SELECT pg_sleep(1);

Afterwards, let's see pg_stat_statements:

=$ psql -U postgres
$ SELECT s.* FROM pg_stat_statements s JOIN pg_database d ON s.dbid = d.oid WHERE d.datname = 'depesz' \gx
─[ RECORD 1 ]───────┬──────────────────────────────
userid              │ 16384
dbid                │ 16385
queryid             │ 6729783856403017864
query               │ SELECT pg_sleep($1)
plans               │ 0
total_plan_time     │ 0
min_plan_time       │ 0
max_plan_time       │ 0
mean_plan_time      │ 0
stddev_plan_time    │ 0
calls               │ 1
total_exec_time     │ 1001.193488
min_exec_time       │ 1001.193488
max_exec_time       │ 1001.193488
mean_exec_time      │ 1001.193488
stddev_exec_time    │ 0
ROWS1
shared_blks_hit     │ 0
shared_blks_read    │ 0
shared_blks_dirtied │ 0
shared_blks_written │ 0
local_blks_hit      │ 0
local_blks_read     │ 0
local_blks_dirtied  │ 0
local_blks_written  │ 0
temp_blks_read      │ 0
temp_blks_written   │ 0
blk_read_time       │ 0
blk_write_time      │ 0
wal_records         │ 0
wal_fpi             │ 0
wal_bytes           │ 0[ RECORD 2 ]───────┼──────────────────────────────
userid              │ 16384
dbid                │ 16385
queryid             │ -7484655548452190292
query               │ SELECT COUNT(*) FROM pg_class
plans               │ 0
total_plan_time     │ 0
min_plan_time       │ 0
max_plan_time       │ 0
mean_plan_time      │ 0
stddev_plan_time    │ 0
calls               │ 1
total_exec_time     │ 0.084508
min_exec_time       │ 0.084508
max_exec_time       │ 0.084508
mean_exec_time      │ 0.084508
stddev_exec_time    │ 0
ROWS1
shared_blks_hit     │ 13
shared_blks_read    │ 0
shared_blks_dirtied │ 0
shared_blks_written │ 0
local_blks_hit      │ 0
local_blks_read     │ 0
local_blks_dirtied  │ 0
local_blks_written  │ 0
temp_blks_read      │ 0
temp_blks_written   │ 0
blk_read_time       │ 0
blk_write_time      │ 0
wal_records         │ 0
wal_fpi             │ 0
wal_bytes           │ 0

Additionally, in logs I got:

2020-12-29 14:20:39.738 CET [95206] depesz@depesz LOG:  connection authorized: user=depesz database=depesz application_name=psql
2020-12-29 14:20:43.728 CET [95206] depesz@depesz LOG:  duration: 3.019 ms  statement: select count(*) from pg_class;
2020-12-29 14:20:58.189 CET [95206] depesz@depesz LOG:  duration: 1001.193 ms  plan:
        Query Text: select pg_sleep(1);
        Result  (cost=0.00..0.01 rows=1 width=4)
2020-12-29 14:20:58.189 CET [95206] depesz@depesz LOG:  duration: 1001.827 ms  statement: select pg_sleep(1);
2020-12-29 14:20:59.619 CET [95206] depesz@depesz LOG:  disconnection: session time: 0:00:19.881 user=depesz database=depesz host=[local]

Please note that for query that took one second, I got usual line “duration: 1001.827 ms statement: select pg_sleep(1)", but also, before it, I got query plan (which is very simple as it's trivial query).

At this moment the Pg is running, logging what I need, doing auto explains of queries that are slow(ish), and gathering data for analysis of query runtimes. All ready for me to start app development.

2 thoughts on “How to install and configure PostgreSQL Debian/Ubuntu – for developer use – part 2”

  1. Using log_min_duration_statement = 0 can be problem on servers with higher load. Too frequent write to log file can be reason of high IO utilization, and related performance problems. More safe but still very valuable value is 50ms. Using zero can be interesting too, because some badly used ORM generates billions pretty fast queries, and it should be logged some times, but It should be specific configuration for some testing time only.

  2. @Pavel:

    while you are right, please note that this is: for developer use – not for production server!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.