What the hell is transaction wraparound?

Recently someone asked on Slack about what is transaction wraparound. Full answer is a bit too much for slack reply, but I can try to explain it in here.

So, every row in PostgreSQL containst two hidden columns: xmin, and xmax. You can see them:

=$ CREATE TABLE wrapit (a int4);
CREATE TABLE
 
=$ INSERT INTO wrapit (a) VALUES (1);
INSERT 0 1
 
=$ SELECT * FROM wrapit;
 a 
---
 1
(1 ROW)
 
=$ SELECT xmin, xmax, * FROM wrapit;
 xmin | xmax | a 
------+------+---
 2615 |    0 | 1
(1 ROW)

These columns represent “xid" number, which we could, though it's not strictly true, understand as “transaction number". Each transaction has its number, and you can always see it:

=$ SELECT txid_current();
 txid_current
──────────────
 2616
(1 ROW)

There is an asterisk here – txid (transaction id) is, currently, 8 byte long, and xid (the value to store in tables) is just 4 bytes long. So, while we can have txid of, let's say 500 billion, maximal value for xid is much lower, at ~ 4 billion. Generally speaking, you can get txid_current() and get module of division of it with 2^32 (4294967296) and you get xid.

Every operation on data, insert, update, or delete, works it's magic by specifying xmin and xmax values to something.

For example, when I inserted row, xmin was set to 2615 (which was txid of the transaction that inserted the row), and xmax was set to 0.

This means that every transaction which has after 2615, will see it, but every that is before might not see it (depending on things like transaction isolation level).

Same goes with deletes – after delete there is specific xmax set, though showing it is a bit harder, because PostgreSQL will not show you deleted records. But we can cheat a bit with pageinspect extension.

So, after just insert we can see the row as:

=$ SELECT xmin, xmax, * FROM wrapit ;
 xmin │ xmax │ a
──────┼──────┼───
 263101
(1 ROW)

And we can peek with pageinspect:

=$ SELECT * FROM heap_page_item_attrs( get_raw_page( 'wrapit', 0 ), 'wrapit'::regclass );
 lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │     t_attrs
────┼────────┼──────────┼────────┼────────┼────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼─────────────────
  18160128263100(0,1)1230424[NULL][NULL]{"\\x01000000"}
(1 ROW)

When I'd delete:

=$ DELETE FROM wrapit ;
DELETE 1
 
=$ SELECT xmin, xmax, * FROM wrapit ;
 xmin │ xmax │ a
──────┼──────┼───
(0 ROWS)
 
=$ SELECT * FROM heap_page_item_attrs( get_raw_page( 'wrapit', 0 ), 'wrapit'::regclass );
 lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │     t_attrs
────┼────────┼──────────┼────────┼────────┼────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼─────────────────
  18160128263126320(0,1)8193128024[NULL][NULL]{"\\x01000000"}
(1 ROW)

Normal select from table doesn't show anything, but the row is still there, in data files, but it's t_xmax has been changed to non-zero.

The interesting thing happens when you update row:

=$ TRUNCATE wrapit ;
TRUNCATE TABLE
 
=$ INSERT INTO wrapit (a) VALUES (1);
INSERT 0 1
 
=$ SELECT * FROM heap_page_item_attrs( get_raw_page( 'wrapit', 0 ), 'wrapit'::regclass );
 lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │     t_attrs
────┼────────┼──────────┼────────┼────────┼────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼─────────────────
  18160128263500(0,1)1204824[NULL][NULL]{"\\x01000000"}
(1 ROW)
 
=$ UPDATE wrapit SET a = 2;
UPDATE 1
 
=$ SELECT * FROM heap_page_item_attrs( get_raw_page( 'wrapit', 0 ), 'wrapit'::regclass );
 lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │     t_attrs
────┼────────┼──────────┼────────┼────────┼────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼────────┼─────────────────
  18160128263526360(0,2)1638525624[NULL][NULL]{"\\x01000000"}
  28128128263600(0,2)327691024024[NULL][NULL]{"\\x02000000"}
(2 ROWS)

Please note that now, after update, we have actually two rows in the table. Well, kinds. Two versions of a row. One of them was inserted in xid 2635, and deleted in 2636, and another that was inserted in 2636. Each update is technically the same as delete and insert.

All these numbers are used to make sure that any transaction sees what it should see. For example. If I'm on transactionid 100, and row was inserted in transaction 90, and deleted in 110 – I should still see it (kinda, depending on commit/rollback, and transaction isolation, but the point is that I should have a way to see it).

Now you can ask: well, how come if I'm on xid 100, how could there be xid 110? Isn't it “in the future"? Yes, it is. The thing is that transaction can take long time. I can start transaction now, and keep it open for a week. In the mean time there will be many more transaction, and all of them will have xids that are larger than my own xid.

To make it all work, and given that xids are only from 0 to ~ 4 billion, PostgreSQL treats them as circular. That means that after 100, next one is 101, and after 4 billion, there is 4 billion and one. But after 4,294,967,295 next one is not 4,294,967,296, but is very low. Almost zero. It would be zero, if it wasn't for a fact that there are couple of very low xids that are reserved for internal use. Magical values are:

  • 0 – invalid xid – no such transaction could have ever happened.
  • 1 – bootstrap xid, used only when initdb does its thing
  • 2 – frozen – VERY important xid that I will explain in a moment

So, at the moment we know that PostgreSQL has a concept of current xid, and that there are xids in the past, and potentially in the future. And that all of this has to fit between 3 and ~ 4,294,967,295.

If I were to ask you – how does PostgreSQL know, which xid is in the past, and which is future, you probably could answer that that's trivial: larger xid is in the future, smaller in the past. Well, great. What about if current xid is 4,294,967,295 ? Next one, in the future, will be 3, or 4, generally much lower number.

So, what Pg does is splits the 4 billion xids into 2 parts of around 2 billion, where the split point is current xid.

Let's explain this bit on smaller numbers. Let's assume that we have the whole range of 0 to 10. Only 11 xids. If my current xid is 5, then xids 0..4 are in the past, and 6..10 are in the future. And if current xid is 3? Future is simple: 4..8. But past now contains two ranges: 0..2, and 9..10. Similarly, if current xid is 9, then past xids will be 4..8, and future would be 10, 0..3.

At any given time you have, in PostgreSQL, around 2 billion xids that PostgreSQL will assume that these are in the past, and around 2 billion that are assumed to be future (from the point of view of current transaction).

Now, let's assume that you inserted a row. It got inserted with xmin = 100. This means that every transaction that started after it should see it. Awesome. Now let's assume that you didn't let autovacuum nor vacuum run for a very long time. 2 billion transactions. And let's see at what would happen if current xid is 2 billion and 200. For this transaction all xids from 200 to two billion and 199, are in the past, but everything else, including transaction 100 are in the future. Suddently Pg doesn't see this data is old. It sees it as something that will happen somewhere in the future. And for now, it doesn't really exist.

This is the data loss (we lose visibility of data) related to transaction wraparound (we let our data sit there for 2 billion transaction, and “future" wrapped around the horizon).

That sounds scary, doesn't it? Well, not really. 2 billion transactions is a long time. And, unless you misconfigured your Pg, you are perfectly safe. This is because of the magical 2 xid that I mentioned earlier. A.K.A. frozen.

You see, whenever autovacuum runs, or normal vacuum, it checks some old rows (old in terms of how far in the past their xmin is in comparison to current transaction), whether they can be frozen.

Which particular xid it was that inserted the row matters only if there are still open transactions that were in the past from the point of view of this xmin.

Let's assume we inserted a row with xmin 500. If there is transaction with xid 400, it matters that this data is in the future. Same for xid 499. But if the oldest transaction currently in the database has xid of 1500, then it doesn't matter if the row was inserted with 400, or 100, or 101 – all of them are in the past.

And here is the magic of vacuum – if there are rows that have xmin that is older than any running transaction, the row gets updated to have xid = 2 – THE FROZEN ONE. This xid means: this row was inserted so long ago, that we don't know when exactly, but it was in the past.

Small asterisk: what I wrote above is, again, not strictly speaking true, xmin is not really changed, but a flag is set that makes pg see the row that way.

Anyway – normal vacuum, or just randomly running autovacuum will not scan the whole table to find old rows. They do it sometimes, not very often, and try hard not to put too much strain on your system.

Sometimes, it happens that oldest xmin in a table is high enough anyway. If it's really high (2 billion) then Pg will stop running to prevent transaction wraparound, and then you have to do black magic to fix it. But realistically, unless you really broke your pg settings, this is not going to happen.

What you can see, though, is autovacuum processes, that look like (in query column in pg_stat_activity):

autovacuum: VACUUM public.some_table (TO prevent wraparound)

This kinda looks scary, but is absolutely safe. This happens when a table crosses some threshold. On most Pg this is set to 200 million:

=$ SHOW autovacuum_freeze_max_age ;
 autovacuum_freeze_max_age
───────────────────────────
 200000000
(1 ROW)

This means that once table has rows that are 200 million transactions old (which is 10% of 2 billion where things would become problem), autovacuum will run in the same more as running VACUUM FREEZE – that is, scan the table for old rows, and freeze them.

There is also interesting side note: even if you disabled autovacuum (which is bad idea virtually always) – PostgreSQL will still run these autovacuum to prevent wraparound! These can't be disabled. Of course you can run a loop that kills autovacuum workers, if you really wanted to, but generally speaking – you would really work hard to reach the wraparound problem in modern Pg.

I hope that explains the problem, why it's not really a problem, and where it comes from.

I'd also like to apologize for simplifications, to the point of not always being true to reality, but I wanted to convey the concepts, and not implementation details, especially in case where I see them changing in not-too-distant future.

If you'd like to know more, couple of links:

5 thoughts on “What the hell is transaction wraparound?”

  1. > xmin is not really changed, but a flag is set that makes pg see the row that way.
    Do you refer to the allfrozen bit from the visibility map here?

  2. @Rajiv:

    aaah. It’s all-frozen, not allfrozen. That’s why I couldn’t find it with grepping.

    No, I don’t refer to this.

    all-frozen is a flag per *block/page* that all rows are frozen. The flag I mean is per row, but it’s not really all that important, because from the point of view of this problem, it’s the same as setting xmin to “2”.

  3. Thanks for the beautiful article. really explains a complex topic in simple terms.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.