Waiting for 8.4 – ordered data loading in pg_dump

Great (and admittedly long overdue) patch by Tom Lane:

Make pg_dump --data-only try to order the table dumps so that foreign keys'
referenced tables are dumped before the referencing tables.  This avoids
failures when the data is loaded with the FK constraints already active.
If no such ordering is possible because of circular or self-referential
constraints, print a NOTICE to warn the user about it.

What it exactly means?

Let's start with basic assumption – this patch touches only –data-only dumping. So, if you never use it, it does absolutely nothing for you. Sorry.

But, if you use –data-only, and you have pre-patch version of PostgreSQL, here is what happens:

First, let's create some tables:

# CREATE TABLE b (id serial PRIMARY KEY);
NOTICE:  CREATE TABLE will CREATE implicit SEQUENCE "b_id_seq" FOR serial COLUMN "b.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will CREATE implicit INDEX "b_pkey" FOR TABLE "b"
CREATE TABLE
 
# CREATE TABLE a (id serial PRIMARY KEY, b_id int4 REFERENCES b (id));
NOTICE:  CREATE TABLE will CREATE implicit SEQUENCE "a_id_seq" FOR serial COLUMN "a.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will CREATE implicit INDEX "a_pkey" FOR TABLE "a"
CREATE TABLE

And now, let's insert some data:

# INSERT INTO b (id) VALUES (DEFAULT), (DEFAULT), (DEFAULT);
INSERT 0 3
# INSERT INTO a (b_id) VALUES (1), (2), (3);
INSERT 0 3

Pretty simple, here are the data:

# SELECT * FROM b;
 id
----
  1
  2
  3
(3 ROWS)
 
# SELECT * FROM a;
 id | b_id
----+------
  1 |    1
  2 |    2
  3 |    3
(3 ROWS)

What will happen if I'll dump it now? With –data-only of course:

=> pg_dump --data-only
...
COPY a (id, b_id) FROM stdin;
1       1
2       2
3       3
\.
...
COPY b (id) FROM stdin;
1
2
3
\.
...

Whoa, that's bad – it will try to load data to a (with fkey to b) – before b is populated.

Well, this data still can be loaded – by disabling (ALTER TABLE … DISABLE TRIGGER) foreign key triggers, but it is complicated, and definitely “not cool" approach.

But now, with this new patch, pg_dump looks differently:

...
COPY b (id) FROM stdin;
1
2
3
\.
...
COPY a (id, b_id) FROM stdin;
1       1
2       2
3       3
\.
...

Order is now correct. Of course it will not work in cases of circular dependency, like:

# CREATE TABLE a (id serial PRIMARY KEY);
# CREATE TABLE b (id serial PRIMARY KEY, a_id int4 REFERENCES a(id));
# ALTER TABLE a ADD COLUMN b_id int4 REFERENCES b (id);
# INSERT INTO a (id) VALUES (DEFAULT);
# INSERT INTO b (id, a_id) VALUES (1, 1);
# INSERT INTO a (b_id) VALUES (1);

It will show nice warning:

pg_dump: NOTICE: there are circular foreign-key constraints among these table(s):
pg_dump:   a
pg_dump:   b
pg_dump: You may not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.

Great addition. And kudos to Tom.

2 thoughts on “Waiting for 8.4 – ordered data loading in pg_dump”

  1. Interesting… I have generally thought this patch was very minor. It’s not the above is not a good idea, but in most use cases, doing pg_dump -a –disable-triggers will accomplish the same goal (and in some cases you may have to revert to that anyway). So yes, this is better, but didn’t terribly excite me.

  2. Off-topic:

    I think that is an issue in pg_restore:

    If you do: pg_dump -s -t table mydb, its returns the table and your indexes.
    If you do: pg_restore -s -t table mydumpfile.dump, its returns only the table.

    If I need to drop one table and recreate them from a backup, I need to pg_restore -s mydumpfile.sql > allstru.sql and search the indexes for the table manually.

    Ohh, yeahhh… before I drop the table I can search in pg_class ou pg_indexes for the indexes of the table, but it’s not so smart, I think.

Comments are closed.