January 3rd, 2011 by depesz | Tags: , , , , , , | 30 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 29th of December, Robert Haas committed interesting patch, which does:

Support unlogged tables.
 
The contents of an unlogged table aren't WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.

(edited commit message, due to this mail.

As you perhaps know every change in table, is first written to WAL (files in pg_xlog/ directory), and then, on checkpoint time – to actual data files.

There is idea that if we'd skip write to WAL (and subsequently – fsync() on commit), writes to the tables would be faster.

Speedup is not free, of course – data in this table wouldn't survive db crash – for the very reason that there is no WAL info for them, and it's the WAL that is being used on recovery.

So. Let's see how it looks/works.

$ create unlogged table test (
id serial primary key,
some_text text
);
NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE

So far, it looks OK.

For comparison purposes, I'll also create second table – identical, but logged;

$ create table test2 (
id serial primary key,
some_text text
);
NOTICE: CREATE TABLE will create implicit sequence "test2_id_seq" for serial column "test2.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test2_pkey" for table "test2"
CREATE TABLE

First thing – how reliable is the data in those tables?

Simple test – I'll insert some rows, issue checkpoint, insert some more rows, and kill -9 all postgres processes. Then I'll restart Pg, and we'll see.

=$ cat run.sql
INSERT INTO test (some_text) VALUES ('a'), ('b'), ('c');
INSERT INTO test2 (some_text) VALUES ('a'), ('b'), ('c');
checkpoint;
INSERT INTO test (some_text) VALUES ('d'), ('e'), ('f');
INSERT INTO test2 (some_text) VALUES ('d'), ('e'), ('f');

=$ psql -f run.sql ; ssh pgdba@localhost killall -9 postgres postmaster
INSERT 0 3
INSERT 0 3
CHECKPOINT
INSERT 0 3
INSERT 0 3
postmaster: no process found

OK, and after restart:

$ select * from test;
id | some_text
----+-----------
(0 rows)
 
$ select * from test2;
id | some_text
----+-----------
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
(6 rows)

OK. so the difference seems to be trivially visible – data from unlogged table is *lost* if there needs to be recovery. And what about normal restart?

=$ psql -f run.sql ; ssh pgdba@localhost pg_ctl -m fast restart
INSERT 0 3
INSERT 0 3
CHECKPOINT
INSERT 0 3
INSERT 0 3
waiting for server to shut down.... done
server stopped
server starting

$ select * from test;
id | some_text
----+-----------
37 | a
38 | b
39 | c
40 | d
41 | e
42 | f
(6 rows)

Nice.

So, what about the speed?

Let's test simple thing – load 450k of rows, using 3 rows per insert, each insert in it's own transaction. So, wrote this simple one-liner:

=$ perl -le 'my $x = "aaaaaaaa"; for (1..450000) {print $x++}' | \
perl -pe 's/.*/(\047$&\047)/;s/\s*\z/,/ if $.%3' | \
sed 's/.*/insert into test (some_text) values &;/'

Which generates output like this:

insert into test (some_text) values ('aaaaaaaa'),('aaaaaaab'),('aaaaaaac');
insert into test (some_text) values ('aaaaaaad'),('aaaaaaae'),('aaaaaaaf');
insert into test (some_text) values ('aaaaaaag'),('aaaaaaah'),('aaaaaaai');
insert into test (some_text) values ('aaaaaaaj'),('aaaaaaak'),('aaaaaaal');
insert into test (some_text) values ('aaaaaaam'),('aaaaaaan'),('aaaaaaao');
insert into test (some_text) values ('aaaaaaap'),('aaaaaaaq'),('aaaaaaar');
insert into test (some_text) values ('aaaaaaas'),('aaaaaaat'),('aaaaaaau');
insert into test (some_text) values ('aaaaaaav'),('aaaaaaaw'),('aaaaaaax');
insert into test (some_text) values ('aaaaaaay'),('aaaaaaaz'),('aaaaaaba');
insert into test (some_text) values ('aaaaaabb'),('aaaaaabc'),('aaaaaabd');

Then, I prepended the output with single “CHECKPOINT;" command (to get as similar situation as possible in both tests), and then ran the test.

I ran the inserts for test table 3 times, and then for test2 (normal, logged) also 3 times. (I did truncate of the tables before each run). Run times (just the inserts):

  • unlogged table: 1m29.108s, 1m29.460s, 1m28.713s.
  • logged table: 15m40.459s, 15m41.560s, 15m44.181s.

That's fast.

Of course I chose the best-case scenario – slow inserts, each normally requiring sync() to discs. What would happen if I'd use “copy"?

  • unlogged table: 0m1.564s, 0m1.567s, 0m1.562s.
  • logged table: 0m3.116s, 0m2.992s, 0m3.183s.

That's really fast.

So, unlogged tables are good for cases when:

  • you don't need it to stay alive over Pg crash
  • you don't need it replicated to slaves (well – at least using warm/hot standby mechanisms)
  • you do need as fast writes as possible

There are not many use-cases for it, but when you have the right use case – the unlogged tables are simply amazing!

  1. 30 comments

  2. Jan 3, 2011

    Well, think HTTP sessions for example. Having to deal with them above the dynamic stack is gruesome (load balancers, cookies, etc). If you put them in the database, you can load balance as much as you want to and it just works when one node fail because the session is to be found easily from any other one.

    That and maybe also all log tables (append only tables) where you’d checkpoint every minute if business rules say that losing 1 minute worth of logs is acceptable.

    Well, some other nice cases will certainly show up now that the feature exists :)

  3. Jan 3, 2011

    @Dim:
    checkpointing doesn’t help with those tables. please check first test. i did inserts, checkpoint, more inserts, and db crash. effect – *all* rows were removed. even pre-checkpoint.

    so – session data – yeah. logs – well, not really.

  4. Jan 3, 2011

    ah yes. confused. sorry.

  5. Jan 3, 2011

    concerning logging, what’s stoping you to

    insert into logged_table (select from nonlogged_table where…)

    in a set interval (being the maximum amount of time you are willing to lose data for)? That can be done asynchronously helping you to achieve the maximum speed at logging time while still giving you the needed safety.

    And of course HTTP sessions (which I recently had to move out of Postgres out of performance concerns).

    This is a really, really cool feature!

  6. Jan 3, 2011

    @Philip:
    nothing stops me. It’s possible. And it might be sensible. In case the costs (more complicated code/infrastructure) will be less important than benefits (speedup).

  7. # arthurr
    Jan 3, 2011

    Hi,

    Is it possible to use an unlogged table for insert/create and then alter it to logged table ?

  8. Jan 3, 2011

    @Arthurr:
    afaik – no.

  9. # arthurr
    Jan 3, 2011

    So … I’ll waiting for 9.2 :)

  10. Jan 3, 2011

    @Arthurr: well, I’m not sure such functionality will appear. Remember that on the “alter table” you’d basically have to copy whole table to WAL – so that it will be on standby server(s). seems complicated.

  11. Jan 3, 2011

    After some thinking and discussions, I really think this is one of the best features I’ve ever seen appear in postgresql. I’m having to deal with so much data I could easily live with losing a certain amount of.

    With this and minimal changes to my application, I could, for each table_with_not_so_important_data an unlogged clone unlogged_table_with_not_so_important data and a view all_table_with_not_so_important_data that just unions them.

    Then I could write a script that in set intervals does a serialized transaction in which it just does insert into table_with_not_so_important_data (select * from unlogged_table_with_not_so_important_data); delete from unlogged_table_with_not_so_important_data. Using the pg_tables view, I could even find all the tables I’m concerned about, so it’s just one script running.

    This is basically turning your first benchmark case (450’000 INSERTs, 15 minutes) into the second (1 copy, 3 seconds) at the cost of, say 10 minutes of possible data loss, which in my case is more than acceptable. Especially because it helps keeping the I/O load down, so every other query can benefit too.

    I see myself using this in about four cases in the application in question (currently sometimes using just inserts, sometimes writing into a file and then processing those entries in bulk, which is considerably more work than my two-table-clones-approach).

    This is amazing (for my usecase).

    Philip

  12. # Marko Tiikkaja
    Jan 3, 2011

    @Arthurr:

    I think someone mentioned this on the development mailing list too, when trying to do fast bulk-loads into a table. But there’s already a way to do that: if you create or TRUNCATE the table in the same transaction before COPYing or INSERTing, no WAL is written and you get the same speedup.

  13. Jan 3, 2011

    The acceleration Marko is referring to isn’t always available. If you have WAL archiving turned on, the capability to do fast bulk loading by doing TRUNCATE first goes away. That makes it essentially unavailable for some environments. Unlogged tables let you create faster semi-permanent tables on the master regardless of the archiving setup.

  14. # Jesper
    Jan 4, 2011

    Could the same infrastructure be utillized by “restores from dump” to speed up that usecase?

    Jesper

  15. # Fernando Hartmann
    Jan 4, 2011

    Very interesting feature !
    But @arthurr reveled a very good point, this feature will be much more helpful if we have the option to change the “unlogged” attribute, so we can use a table normally with all PG features, and WEN, only wen, we need to insert a lot of data, we can change the attribute and live with the risk of data loss.

  16. # Tira
    Jan 4, 2011

    Hi

    The most obvious use for such tables are temporary (intermediate) tables within functions.

    For example a function that needs to calculate a temporary table which is then used to derive the final output. That temporary table is going to be dropped (possibly with “on commit drop”) anyhow at the end of the function, might as well make it unlogged and gain some performance.
    If the function fails in the middle (hardware failure, for example) then everything is rolled back and I don’t care that the temporary table got truncated (since it is bound to disappear anyhow).

    Tira.

  17. # Marko Tiikkaja
    Jan 5, 2011

    @Tira:

    Regular temporary tables already behave like that. There is no reason to use this feature for the use case you mentioned.

  18. # darkos
    Jan 5, 2011

    this is an excellent capability though for any batch process where, if it were to die part way through, the entire batch process would be re-started, rather than trying to pick up from where it left off.

  19. Jan 5, 2011

    This is a great feature. I think it will make a lot of MySQLers looking for MyISAM type behavior happy.

    2 cases for me where this is incredably useful.
    1) Stats tables – I have stats tables that are basically just rollups of existing data. So if I lose the table in a crash, its not a big deal since I can reconstitute with a scheduled stored proc. However since I’m rolling up a lot of data, the fast loading is really important.

    2) Stupid web logs. Heck if it crashed it wouldn’t really be too big of a deal – but again insert speed is really important. Its not mission critical — just nice to have data (kind of like Twitter data that’s months old).

  20. # demo
    Jan 5, 2011

    Will an unlogged table be writable on a synchronous replication slave?

  21. Jan 5, 2011

    @Demo:
    We’ll see, but I doubt that. As far as I understand, sync replication that will be (hopefully) in 9.1 will be based on wal shipping. since unlogged tables are not in wal – it will not be on slave.

  22. Jan 5, 2011

    Another way to mitigate between unlogged performance and reliable data after a while would be using PGQ.

    If you store the PGQ tables as unlogged and have a consumer that basically replays the traffic into the static table, you have that.

    The performance gain will come from the fact that the consumer does the writes in batch and without concurrency, which is a huge performance drop as soon as WAL is concerned.

    Also, we’re talking about 3s lag by default, which should make that approach sensible in lots of cases. I guess.

  23. Jan 5, 2011

    Immediate use which comes to mind: any kind of calculated/cached data. Whether it improves performance by caching denormalized/formatted data or by caching remotely loaded something, unlogged table can be sweet.

    Is the data lost also when the table was not touched at all (= does recovery simply throw away all unlogged tables, or only those which would need an update)? If not, then partitioning by time could be used to limit the range of data loss…

  24. Jan 5, 2011

    @Mekk:
    all unlogged tables are truncated on recovery. touched or untouched – all of them.

    reason is very simple – since we don’t know what happened to them – we can’t assume anything about their content.

  25. # Mathew Frank
    Jan 6, 2011

    This is REALLY useful for speeding up ETL (ELT for purists). I usually turn off logs on Oracle for this reason. Brilliant to see it in Postgres at last.

    Why? Because your etl code keeps track of each table you have loaded, and in the event of a crash, you just either start again from scratch (having lost no actual data) or from the table you were loading when it crashed.

    The caveat for ETL is you still want logging on tables for which you store logging or change data that cannot be lost – ie no longer exists in source tables.

  26. # demo
    Jan 11, 2011

    @depesz

    So even the DML statements to create/delete the unlogged table are unlogged?

    The use-case for the original question is that if an unlogged table was empty and writable on a synchronous slave, then it could be used as a cache for ca

  27. # Billy
    Jun 17, 2011

    I’m hoping to be able to reproduce your benchmark on my own so I can re-run as 9.1 becomes more stable over time. Any chance you could provide the specs for you environment and/or the actual code you used, or anything that might help?

  28. Jun 17, 2011

    @Billy: i’m not sure which machine I ran the tests on, but as for data – all scripts to create data are in the body of blogpost.

  29. # Billy
    Jun 17, 2011

    I only ask because I believe I have replicated your 450,000 insert test but with less impressive results:

    unlogged table: 29.549s, 22.196s, 22.486s.
    logged table: 49.497s, 59.519s, 51.108s.

    As you can see, unlogged is still faster but by a factor of 2x, not 10x like yours.

    I’m using “PostgreSQL 9.1beta2 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704
    (Red Hat 4.1.2-46), 32-bit”

    I’m at a loss for how to explain the different results. Any thoughts?

  30. Jun 19, 2011

    @Billy:
    no idea, sorry.

  31. # joao cosme
    Dec 1, 2014

    very nice explanation.

Leave a comment