Waiting for 9.2 – excluding data of table from dump

On 14h of December, Andrew Dunstan committed patch:

Add --exclude-table-data option to pg_dump.
 
Andrew Dunstan, reviewed by Josh Berkus, Robert Haas and Peter Geoghegan.
 
This allows dumping of a table definition but not its data, on a per table basis.
Table name patterns are supported just as for --exclude-table.

This patch gives me mixed feelings. On one hand – it's really great. On the other – I can't shake the feeling that it's misplaced effort.

First – let's see what it can do for us.

I have database, that is relatively small, with the exception of single table – which logs all messages sent.

With this new switch I can make dump of whole database, including the large table, but without data for it (for example to load it on dev server).

=$ pg_dump --exclude-table-data="messages" -Fc messenger_jabberd > smaller.dump
=$ pg_dump -Fc messenger_jabberd > larger.dump
=$ diff -u <( pg_restore -l smaller.dump ) <( pg_restore -l larger.dump )
--- /dev/fd/63  2011-12-15 14:14:55.288221126 +0100
+++ /dev/fd/62  2011-12-15 14:14:55.288221126 +0100
@@ -1,7 +1,7 @@
 ;
-; Archive created at Thu Dec 15 14:13:41 2011
+; Archive created at Thu Dec 15 14:13:51 2011
 ;     dbname: messenger_jabberd
-;     TOC Entries: 48
+;     TOC Entries: 49
 ;     Compression: -1
 ;     Dump Version: 1.12-0
 ;     Format: CUSTOM
@@ -39,6 +39,7 @@
 2034; 0 16406 TABLE DATA public browse depesz
 2035; 0 16412 TABLE DATA public last depesz
 2036; 0 16418 TABLE DATA public mailaddresses depesz
+2037; 0 16424 TABLE DATA public messages depesz
 2038; 0 16432 TABLE DATA public presence depesz
 2039; 0 16443 TABLE DATA public presence_history depesz
 2040; 0 16454 TABLE DATA public privacy depesz

As you can see it dumped everything except for the data.

This is actually great, because without this patch you'd have to make two pg_dumps – one for all tables except the one that you don't want the data for, and another with just schema of the large table.

For this – I greatly thank Andrew.

And now for the feeling about misplaced effort.

First – don't get me wrong – I appreciate the work, and will gladly use it (as soon as my clients will switch to 9.2).

But – currently we have following include/exclude switches in pg_dump:

  • -a, –data-only dump only the data, not the schema
  • –exclude-table-data=TABLE do NOT dump data for the named table(s)
  • -N, –exclude-schema=SCHEMA do NOT dump the named schema(s)
  • –no-security-labels do not dump security label assignments
  • –no-unlogged-table-data do not dump unlogged table data
  • -n, –schema=SCHEMA dump the named schema(s) only
  • -s, –schema-only dump only the schema, no data
  • -T, –exclude-table=TABLE do NOT dump the named table(s)
  • -t, –table=TABLE dump the named table(s) only
  • -x, –no-privileges do not dump privileges (grant/revoke)

Doesn't it seem excessive?

What's worse – it still doesn't let me (for example) dump just specific function or view. Or trigger. Or just ACLs.

In my opinion the effort would be better spent by adding something alike pg_restore's -l/-L options.

For example:

pg_dump -l ... > list

would list all objects that would get dumped.

pg_dump -L list ...

Will dump just objects that are listed in the list file.

And that's it. No more options. All excluding/including can be done on the list file.

Not sure why we don't have this options, but given lack of it – I definitely appreciate and thank Andrew for the –exclude-table-data.

4 thoughts on “Waiting for 9.2 – excluding data of table from dump”

  1. @GJ:
    you *can* do it on restore.
    Just use -Fc on pg_dump, and you can do anything you want on restore time.

  2. Just thought I’d mention that a new tool I’ve been working on (with some help from Depesz as well!) can do some of the more advanced filtering that regular pg_dump lacks.

    pg_extractor – https://github.com/omniti-labs/pg_extractor

    You can feed it a file listing tables, functions, views, etc to either include or exclude. Can even do regex to match objects with a naming pattern (like partitions).
    I’ll be working on getting together some example documentation soon (POD docs are included in the script file).

    I agree that the L/l options you mention would be much better than this in the long run, though.

Comments are closed.