October 4th, 2008 by depesz | Tags: , , , | 5 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 30th of September, Heikki Linnakangas committed his patch that changes FSM:

Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).
 
This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.
 
Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.

What does it mean for average DBA?

Well, for starters, you just lost 2 options in postgresql.conf: max_fsm_pages
and max_fsm_relations.

These variables (when set incorrectly) could make vacuum less effective (which happened a lot). So basically it's good that they are gone.

What's more? Well. To do it, Haikki had to implement so called “Relation forks". And these are important, because (as I understand, if I understand it wrong, please correct me) they can (and possibly will be) used to store “visibility maps" – which will make vacuums much faster (and possibly have influence on index scans, but that's just my guess).

What are there relation forks? It's simple. As you know tables are stored in files named:

$PGDATA/base//

Optionally there is .1, .2 and so on suffix – in case table (or index) gets over 1gb.

Relation forks create second set of files – named
_.

For fsm, the code is "1" (but there are talks to use textual codes).

For example:

# create table x (id int4);
CREATE TABLE
 
# select oid from pg_database where datname = 'depesz';
oid
-------
16385
(1 row)
 
# select relfilenode from pg_class where relname = 'x' and relkind = 'r';
relfilenode
-------------
16387
(1 row)
 
=> ls -l $PGDATA/base/16385/16387*
-rw------- 1 pgdba pgdba 0 2008-10-04 12:50 /home/pgdba/data/base/16385/16387
-rw------- 1 pgdba pgdba 0 2008-10-04 12:50 /home/pgdba/data/base/16385/16387_1

Of course - since fsm now stores full set of information - it is not limited to
any size. So - it might be interesting how much space does it take. Let's test:

# insert into x (id) select * from generate_series(1,100000);
INSERT 0 100000
 
# \! ls -l $PGDATA/base/16385/16387*
-rw------- 1 pgdba pgdba 3219456 2008-10-04 12:53 /home/pgdba/data/base/16385/16387
-rw------- 1 pgdba pgdba 24576 2008-10-04 12:53 /home/pgdba/data/base/16385/16387_1
# insert into x (id) select * from generate_series(1,100000);
INSERT 0 100000
 
# \! ls -l $PGDATA/base/16385/16387*
-rw------- 1 pgdba pgdba 6430720 2008-10-04 12:53 /home/pgdba/data/base/16385/16387
-rw------- 1 pgdba pgdba 24576 2008-10-04 12:53 /home/pgdba/data/base/16385/16387_1
# insert into x (id) select * from generate_series(1,100000);
INSERT 0 100000
 
# \! ls -l $PGDATA/base/16385/16387*
-rw------- 1 pgdba pgdba 9641984 2008-10-04 12:53 /home/pgdba/data/base/16385/16387
-rw------- 1 pgdba pgdba 24576 2008-10-04 12:53 /home/pgdba/data/base/16385/16387_1

Ok, so we can see that it doesn't grow when I insert data to main table.

But, maybe it is so because my relation is pretty small - only 1177 pages. Let's check on something bigger:

# drop table x;
DROP TABLE
 
# create table x (id int4, dummy_text text);
CREATE TABLE
 
# alter table x alter column dummy_text set storage plain;
ALTER TABLE
 
# select relfilenode from pg_class where relname = 'x' and relkind = 'r';
relfilenode
-------------
16408
(1 row)

Side note: I did alter column set storage plain to keep all data from dummy_test in main table, without compression - effectively disabling TOAST.

# insert into x select i, repeat('depesz', 500) from generate_series(1,100000) as i;
INSERT 0 100000
 
# \! ls -l $PGDATA/base/16385/16408*
-rw------- 1 pgdba pgdba 409600000 2008-10-04 13:05 /home/pgdba/data/base/16385/16408
-rw------- 1 pgdba pgdba 122880 2008-10-04 13:05 /home/pgdba/data/base/16385/16408_1

And what will happen if I'll update 50% of the records?

# update x set dummy_text = repeat('_test_', 500) where id <= 50000;
UPDATE 50000
 
# \! ls -l $PGDATA/base/16385/16408*
-rw------- 1 pgdba pgdba 614400000 2008-10-04 13:09 /home/pgdba/data/base/16385/16408
-rw------- 1 pgdba pgdba 172032 2008-10-04 13:08 /home/pgdba/data/base/16385/16408_1

This tells us that the overhead of data on disk is about 0.03% - which means it is negligible.

The benefits? We have one less thing to worry about (setting too low values for fsm), and foundation for future code that will make vacuum run faster. Much faster.

  1. 5 comments

  2. Oct 4, 2008

    As always I love the blog.

    But I have one question regarding its formatting. Maybe its just me but I am seeing nested scrollable viewing boxes in addition to the ones that come “stock” with my browser. And since some of your blog doesn’t line-wrap, I have to repeatedly adjust two-three vertical and horizontal scroll bars just to get to the completion of one of the sentences.

    Is this a problem with my browser configuration?

  3. Oct 4, 2008

    @Richard Broersma:
    most probably you see them for boxed with sample code – it happens when lines in these boxes are too wide.

    i can’t do much about it as the boxes should show exact examples of commands/commands output.

  4. # Kim Bisgaard
    Oct 4, 2008

    The and are not visible please convert to <table-filenode> or that ever to make them visible.
    Thanks for yet another good blog!

  5. # Heikki Linnakangas
    Oct 6, 2008

    For the record, the relation fork naming was changed afterwards so that the suffix is “_fsm” instead of “_1″ for the FSM files.

  6. # Robert Treat
    Oct 8, 2008

    Also wanted to give a pointer for those running < 8.4 (which is probably all of us), I’ve implemented a nagios check for the fsm parameters using the pg_freespacemap contrib module, available as part of the check_postgres script. http://bucardo.org/check_postgres/check_postgres.pl.html#fsm_pages.

Leave a comment