Waiting for PostgreSQL 19 – Add non-text output formats to pg_dumpall

On 26th of February 2026, Andrew Dunstan committed patch:

Add non-text output formats to pg_dumpall
 
pg_dumpall can now produce output in custom, directory, or tar formats
in addition to plain text SQL scripts. When using non-text formats,
pg_dumpall creates a directory containing:
- toc.glo: global data (roles and tablespaces) in custom format
- map.dat: mapping between database OIDs and names
- databases/: subdirectory with per-database archives named by OID
 
pg_restore is extended to handle these pg_dumpall archives, restoring
globals and then each database. The --globals-only option can be used
to restore only the global objects.
 
This enables parallel restore of pg_dumpall output and selective
restoration of individual databases from a cluster-wide backup.
 
Author: Mahendra Singh Thalor <mahi6run@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>
Reviewed-By: Tushar Ahuja <tushar.ahuja@enterprisedb.com>
Reviewed-By: Jian He <jian.universality@gmail.com>
Reviewed-By: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com>
Reviewed-By: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/cb103623-8ee6-4ba5-a2c9-f32e3a4933fa@dunslane.net

This is huge.

So, for very long time we had, obviously, pg_dump – which was great at dumping database, potentially in parallel, with compression, and then possibly restore partially. And we had pg_dumpall – interesting tool, that was used by a lot of people that just started with PostgreSQL. One that I personally never even considered. I mean, I did use it for dumping globals (roles mostly), but I wouldn't use it for dumping actual databases.

The reason was simple – it was slow, and generated text based outputs where I couldn't pick what will get loaded. At least not without some “parsing" of dump file. Which was suboptimal.

Now, though – all my problems with it are, supposedly, gone.

Now, let's see the baseline. Running:

=$ =$ vtime pg_dumpall -f dump.all.plain
real 104.565s
 
=$ ls -ldh dump.all.plain
-rw-rw-r-- 1 depesz depesz 70G Mar 17 10:53 dump.all.plain

Ok. 104s, and 70GB in single file. Of course I could compress it, but that's the default output.

Apparently pg_dumpall, even when dumping -Fd, can't use multiple parallel workers. So, let's instead dump to custom format, and we'll see how that will look like:

=$ vtime pg_dumpall -Fc -f dumpall-fc
real 759.551s
 
=$ ls -ldh dumpall-fc
drwx------ 3 depesz depesz 4.0K Mar 17 11:01 dumpall-fc/
 
=$ fd . dumpall-fc/ -X ls -lhd
drwx------ 2 depesz depesz 4.0K Mar 17 11:14 dumpall-fc/databases
-rw-rw-r-- 1 depesz depesz 4.0G Mar 17 11:08 dumpall-fc/databases/16386.dmp
-rw-rw-r-- 1 depesz depesz 1.6K Mar 17 11:01 dumpall-fc/databases/1.dmp
-rw-rw-r-- 1 depesz depesz 1.1K Mar 17 11:14 dumpall-fc/databases/5.dmp
-rw-rw-r-- 1 depesz depesz  11K Mar 17 11:08 dumpall-fc/databases/740752.dmp
-rw-rw-r-- 1 depesz depesz 1.2K Mar 17 11:01 dumpall-fc/databases/740820.dmp
-rw-rw-r-- 1 depesz depesz 4.0G Mar 17 11:14 dumpall-fc/databases/740829.dmp
-rw-rw-r-- 1 depesz depesz  469 Mar 17 11:14 dumpall-fc/map.dat
-rw-rw-r-- 1 depesz depesz 2.5K Mar 17 11:14 dumpall-fc/toc.glo

So, this took significantly longer. And it produced set of files. What can we do with it?

First, let's look at what is inside. There are two files, and directory with six .dmp files. Let's investigate…

Dmp files in databases are simple pg_dump -Fc dumps of every db separately. Which you can then use to list their content, or, perhaps, load just partially:

=$ pg_restore -l dumpall-fc/databases/740829.dmp  | grep -v TABLE.DATA &gt; list.just.table
 
=$ pg_restore -L list.just.table -f - dumpall-fc/databases/740829.dmp
--
-- PostgreSQL database dump
--
 
\restrict D1GT0oDrsj9xgHrP27Km4hzhdMQLRWqngWnGgYgLKVkJ6NuOoNresbSwqDJuL4r
 
-- Dumped from database version 19devel
-- Dumped by pg_dump version 19devel
 
SET statement_timeout = 0;
…
SET default_table_access_method = heap;
 
--
-- Name: plans; Type: TABLE; Schema: public; Owner: depesz
--
 
CREATE TABLE public.plans (
    id text NOT NULL,
…
    entered_from inet
);
 
 
ALTER TABLE public.plans OWNER TO depesz;
 
--
-- PostgreSQL database dump complete
--
 
\unrestrict D1GT0oDrsj9xgHrP27Km4hzhdMQLRWqngWnGgYgLKVkJ6NuOoNresbSwqDJuL4r

Sweet.

So how about the other two files?

map.dat is very simple file that tells us which database had which oid at the time of dump, and this, in turn tells us which file will contain which dump:

=$ cat dumpall-fc/map.dat
#################################################################
# map.dat
#
# This file maps oids to database names
#
# pg_restore will restore all the databases listed here, unless
# otherwise excluded. You can also inhibit restoration of a
# database by removing the line or commenting out the line with
# a # mark.
#################################################################
1 template1
740820 depesz
16386 depesz_explain
740752 test
740829 pgdba
5 postgres
 
=$ ls -l dumpall-fc/databases/740829.dmp
-rw-rw-r-- 1 depesz depesz 4198979861 Mar 17 11:14 dumpall-fc/databases/740829.dmp

In here we see that 740829.dmp contains dump of database pgdba. Sweet.

toc.log, finally, is another pg_dump -Fc type dump, which contains information on all global things:

=$ pg_restore -l dumpall-fc/toc.glo
;
; Archive created at 2026-03-17 11:01:42 CET
;     dbname: postgres
;     TOC Entries: 13
;     Compression: none
;     Dump Version: 1.16-0
;     Format: CUSTOM
;     Integer: 4 bytes
;     Offset: 8 bytes
;     Dumped by pg_dump version: 19devel
;
;
; Selected TOC Entries:
;
1; 0 0 default_transaction_read_only - default_transaction_read_only
2; 0 0 client_encoding - client_encoding
3; 0 0 standard_conforming_strings - standard_conforming_strings
4; 0 0 DROP_GLOBAL - DATABASE depesz
5; 0 0 DROP_GLOBAL - DATABASE depesz_explain
6; 0 0 DROP_GLOBAL - DATABASE test
7; 0 0 DROP_GLOBAL - DATABASE pgdba
8; 0 0 DROP_GLOBAL - ROLE depesz
9; 0 0 DROP_GLOBAL - ROLE depesz_explain
10; 0 0 DROP_GLOBAL - ROLE pgdba
11; 0 0 ROLE - ROLE depesz
12; 0 0 ROLE - ROLE depesz_explain
13; 0 0 ROLE - ROLE pgdba

And that's all about what is inside.

I would actually LOVE to get –jobs=X support for pg_dumpall, but I guess it will come in time. For now, it's great addition to my toolbox, and something that actually makes pg_dumpall a serious tool to consider when making simple backups.

Thanks to everyone involved, great work 🙂

=$
|

Waiting for PostgreSQL 19 – Add non-text output formats to pg_dumpall

Leave a Reply