Waiting for PostgreSQL 15 – Allow archiving via loadable modules.

On 3rd of February 2022, Robert Haas committed patch:

Allow archiving via loadable modules.
 
Running a shell command for each file to be archived has a lot of
overhead and may not offer as much error checking as you want, or the
exact semantics that you want. So, offer the option to call a loadable
module for each file to be archived, rather than running a shell command.
 
Also, add a 'basic_archive' contrib module as an example implementation
that archives to a local directory.
 
Nathan Bossart, with a little bit of kibitzing by me.
 
Discussion: http://postgr.es/m/20220202224433.GA1036711@nathanxps13


Update from 2022-02-08

I've been contacted by Nathan Bossart (author of the patch) that pointed that I misinformed you. Specifically – you don't need to restart pg to change archive_library, it should work with just SIGHUP (and potentially pg_ctl reload, or select pg_reload_conf()) – I can't test it fully now (mostly because there is only one archive module), but he is The Author, so I guess he knows what he is talking about 🙂

Sorry for bad info.


This is HUGE.

Normally, one wants to have working WAL archive for purposes of backups, and as a fallback in case streaming replications falls behind too much.

So far, to have WAL archive, we were using archive_command setting, which, basically, run shell tool for every wal file to archive.

Which worked, but had certain drawbacks related to performance. If you're generating LOTS of wal, cost of forking off another process so often is not negligible.

Most of commonly used tools worked around the problem by handling multiple files in one go, and “marking" files as archived, for files that were not the ones that they should be called for. It worked, but was always hacky.

Now, the situation changes. One can write extension to Pg that will be called for every wal file, but it will be always loaded. All the costs related to forking – gone.

Let's see how that works.

For starters, we now have one more global GUC: archive_library. If this is set (and archive_mode is on), the library will be used, and not archive_command.

Let's try to set it:

$ SELECT name, setting FROM pg_settings  WHERE name  ~ '^archive' ORDER BY 1;
          name           │  setting  
─────────────────────────┼───────────
 archive_cleanup_command │ 
 archive_command         │ /bin/TRUE
 archive_library         │ 
 archive_mode            │ ON
 archive_timeout         │ 0
(5 ROWS)

Currently, I don't have archiving set. It's enabled, but archive_command is /bin/true, so effectively archiving is not happening.

Let's enable it with the sample, provided, basic_archive, library:

$ ALTER system SET archive_library = 'basic_archive';

This change requires restart (it loads external library to Pg process, so happens only of postmaster start).

Please note that I didn't, yet, dedicate any directory for archive.

Normally, with archive_command if I'd fail to set it up properly, I would see Pg process like:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
pgdba       1189  0.0  0.0 219572  6272 ?        Ss   14:49   0:00 postgres: 14/main: archiver process failed ON 0000000100000F72000000F0

but now, with archive_library:

=$ ps uw -U pgdba  | grep arch
pgdba      37562  0.0  0.0 209756  4964 ?        Ss   15:48   0:00 postgres: archiver

I don't see any error in there. It either means that the information is gone from ps output, or that it worked?

Luckily I can simply check:

$ SELECT * FROM pg_stat_archiver;
 archived_count │    last_archived_wal     │      last_archived_time       │ failed_count │ last_failed_wal │ last_failed_time │          stats_reset          
────────────────┼──────────────────────────┼───────────────────────────────┼──────────────┼─────────────────┼──────────────────┼───────────────────────────────
            233 │ 0000000100000000000000E9 │ 2022-02-06 15:48:06.802869+01 │            0[NULL][NULL]2022-02-06 14:57:31.437907+01
(1 ROW)

That's surprising. It worked? Let's check logs:

=$ grep archive postgresql.log | grep -v pg_stat_archiver | tail -n 5
2022-02-06 15:50:46.500 CET @ 37562  WARNING:  archive_mode enabled, yet archiving IS NOT configured
2022-02-06 15:51:46.561 CET @ 37562  WARNING:  archive_mode enabled, yet archiving IS NOT configured
2022-02-06 15:52:46.581 CET @ 37562  WARNING:  archive_mode enabled, yet archiving IS NOT configured
2022-02-06 15:53:46.641 CET @ 37562  WARNING:  archive_mode enabled, yet archiving IS NOT configured
2022-02-06 15:54:46.648 CET @ 37562  WARNING:  archive_mode enabled, yet archiving IS NOT configured

OK. So it fails. Kinda. Let's finish configuration, but first, let's see current wal location:

$ SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());
 pg_current_wal_lsn │     pg_walfile_name      
────────────────────┼──────────────────────────
 1/1206FCD0         │ 000000010000000100000012
(1 ROW)

OK. This is current WAL location. Please note that in pg_stat_archiver we saw that last_archived_wal was 0000000100000000000000E9…

Let's see how this will work out. I made /tmp/archive, and changed basic_archive.archive_directory to /tmp/archive (by editing conf file, I couldn't use alter system for this).

After reload and some time:

$ SELECT * FROM pg_stat_archiver ;
 archived_count │    last_archived_wal     │      last_archived_time       │ failed_count │ last_failed_wal │ last_failed_time │          stats_reset          
────────────────┼──────────────────────────┼───────────────────────────────┼──────────────┼─────────────────┼──────────────────┼───────────────────────────────
            273 │ 000000010000000100000011 │ 2022-02-06 16:00:25.272431+01 │            0[NULL][NULL]2022-02-06 14:57:31.437907+01
(1 ROW)

and in /tmp/archive I saw 40 files, from 0000000100000000000000EA to 000000010000000100000011.

So, it works.

Code for basic_archive is really simple, and there is not much to it, but it's just a proof of concept that it works, and what it can achieve. I can imagine that developers of Barman, pgBackRest, and other archiving/backup tools, will soonish release new versions that can be used as a library, and not forking command.

Also, once I got it to actually work, information in ps output came back:

pgdba      37562  0.0  0.0 209756  6916 ?        Ss   15:48   0:00  \_ postgres: archiver LAST was 000000010000000100000011

or

pgdba      37562  0.0  0.0 209756  6916 ?        Ss   15:48   0:00  \_ postgres: archiver failed ON 000000010000000100000012

Amazing. Thanks a lot to all involved, it's very appreciated.