Why is my WAL directory so large?

This problem happened recently to couple of people on various Pg support channels, so I figured I can write a bit more about it, so that in future I have a place where I can refer people to 🙂

Before we will go any further, please consider reading my earlier explanation on what wal is, and how to configure certain aspects of it.

Since then (over 10 years) some things changed, like configuration parameters, but the idea what WAL is, what it's used for, is basically the same.

As a quick reminder WAL (Write-Ahead-Log) resides in pg_wal directory inside your data directory. You can find the path by doing, in your db client:

$ select current_setting('data_directory') || '/pg_wal';
        ?column?
─────────────────────────
 /home/pgdba/data/pg_wal
(1 row)

It will contain bunch of files with names like:

0000000B00000C19000000E6 – majority of the files, each should be exactly 16MB in size (16777216 bytes)
optionally some small files named like 00000003.history – they show uw when you promote replica to primary
archive_status directory, which has potentially LOTS of files, but each should be 0 in size, as these are just markers about what's going on with archiving

Now, normally, PostgreSQL will keep this directory to be between min_wal_size and max_wal_size.

Additionally, Pg checks value of wal_keep_size. This can be set to arbitrarily large value, but whatever it will be set to, it will be static, so while it can explain why wal dir is large, it can't explain why it's growing.

But sometimes, this directory grows. Generally there are three cases that cause it:

archiving is failing
archiving is slow
replication slots

First two things you can check by looking into system view pg_stat_archiver.

To get data in more helpful way use this query:

=> select pg_walfile_name( pg_current_wal_lsn() ), * from pg_stat_archiver \gx
─[ RECORD 1 ]──────┬──────────────────────────────
pg_walfile_name    │ 0000000100000A9C0000007D
archived_count     │ 695430
last_archived_wal  │ 0000000100000A9C0000007C
last_archived_time │ 2023-06-18 08:11:28.661166+00
failed_count       │ 764
last_failed_wal    │ 0000000100000A9400000074
last_failed_time   │ 2023-06-17 22:52:30.378964+00
stats_reset        │ 2023-06-01 07:05:30.043918+00

In here we see some statistics. The two most important bits as pg_walfile_name, which is name of current wal file, and last_archived_wal.

In my case, there is one wal of difference, which is perfectly OK. But on some other system I got values:

pg_walfile_name 00000001000123D100000076
last_archived_wal 00000001000123D10000006E

In this case we have 8 wal segments of difference, and since you can't archive current wal (because it's being written to), it looks that we have archive lag of 7.

Of course 7 is not big number, so it's not a problem. Each file is 16MB, so my archiving lag on this system is/was 112MB.

One note, though. You can see cases like:

pg_walfile_name 000000020000C6AE00000000
last_archived_wal 000000020000C6AD000000FF

Naive hex-based difference will show us difference of 4294967041 wal files. But this is simply a mistake. If you want to process these numbers using normal hex math, you have to remove 6 zeros that are there before last 2 characters.

So, do to hex math, you first change “000000020000C6AE00000000" to “000000020000C6AE00" and “000000020000C6AD000000FF" to “000000020000C6ADFF", and then subtract them from each other.

Anyway. If you have archive lag, and you will get the pg_stat_archiver data after 5 minutes, and you will still see the same last_archived_wal – then your archiving is clearly failing. Check logs of PostgreSQL, or the tool you use for archiving.

Another potential problem might be that you're archiving slower than you're generating new WAL. This happens too. In such case you will see that the archiving lag is zero, then it sharply rises, and then slowly goes back to zero. This is related to some operation in database that generated lots of WAL, that now has to be archived. Which usually includes some form of compression, and potentially encryption.

If this is your problem – you'll have to dig deeper to find out how to speed up your solution. There are many ready made solutions, and most of them are pretty well optimized, so consider (if it applies) migration to something battle tested, ready. Like pgBackRest.

But what if archiving is working OK? No lag, no errors, all well?

In such case, it's most likely a problem with replication slot. Replication slots are a thing that makes PostgreSQL remember that there is some replication client that got data up to certain location, and that it will come back, and DO NOT DELETE wal that it will still need.

You can see all data about replication slots using this query:

=> select pg_current_wal_lsn(), pg_current_wal_lsn() - restart_lsn, * from pg_replication_slots \gx
─[ RECORD 1 ]───────┬────────────────────────
pg_current_wal_lsn  │ A9C/97446FA8
?column?            │ 3887688
slot_name           │ magical_slot_for_depesz
plugin              │ pgoutput
slot_type           │ logical
datoid              │ 16658
database            │ canvas
temporary           │ f
active              │ t
active_pid          │ 2883499
xmin                │ [null]
catalog_xmin        │ 131472288
restart_lsn         │ A9C/97091D60
confirmed_flush_lsn │ A9C/97396C70
wal_status          │ reserved
safe_wal_size       │ [null]
two_phase           │ f

In here we can see situation where wal is at location A9C/97446FA8 (which means it's in wal file XXXXXXXX00000A9C00000097), but magical_slot_for_depesz requires (column restart_lsn) wal since location A9C/97091D60 (which is still in the same WAL file as current one.

In this situation there is no problem, but let's consider case where pg_current_wal_lsn is 1A4A5/602C8718 and restart_lsn is 1A4A4/DB60E3B0. In this case replication lag on this slot is 2227938152 bytes (a bit over 2GB). This means that even when archiving is working 100% OK, and min_wal_size, max_wal_size, and wal_keep_size will all be below 200MB, we will still have 2GB of wal files.

If such case happens for you then find out what this slot is being used for, and either fix the replication, so WAL will start being consumed, or just drop the replication slot (using pg_drop_replication_slot() function).

While I can theoretically imagine there being more cases for WAL accumulation, these two (failing/slow archiving, and replication slots) are the majority of problems that I've encountered. If you'll find something else, please reach to me on
IRC, Slack, or Discord, I'll gladly look into it 🙂

=$
|

One thought on “Why is my WAL directory so large?”