Waiting for 9.3 – Add support for piping COPY to/from an external program.

On 27th of February, Heikki Linnakangas committed patch:

Add support for piping COPY to/from an external program.
 
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
 
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
 
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
 
Etsuro Fujita, reviewed by Amit Kapila.

The commit message seems to be self explanatory, but let's see how it works, and what it can be used for.

First example, copying data to a compressed file:

$ copy pg_class TO program 'gzip -9c - > /tmp/copy.data.gz';
COPY 296

And, it works:

=$ ls -l /tmp/copy.data.gz
-rw------- 1 pgdba pgdba 4293 Feb 28 14:15 /tmp/copy.data.gz
 
=$ zcat /tmp/copy.data.gz | head
pg_statistic    11      10817   0       10      0       11797   0       15      386     15      2840    0       t       f       p       r       26      0       f       f       f       f       f       712     1       {pgdba=arwdDxt/pgdba}   \N
pg_type 11      71      0       10      0       0       0       8       332     8       0       0       t       f       p       r       30      0       t       f       f       f       f       712     1       {=r/pgdba}      \N
sys_variables   2200    16409   0       16385   0       16407   0       1       2       0       16410   0       t       f       p       r       4       0       f       f       f       f       f       765     1       \N      \N
pg_toast_2619   99      11049   0       10      0       11799   0       2       10      2       0       2841    t       f       p       t       3       0       f       f       f       f       f       712     1       \N      \N
pg_toast_2619_index     99      0       0       10      403     11801   0       2       10      0       0       0       f       f       p       i       2       0       f       f       f       f       f       0       0       \N      \N
pg_authid_rolname_index 11      0       0       10      403     0       1664    2       1       0       0       0       f       t       p       i       1       0       f       f       f       f       f       0       0       \N      \N
pg_authid_oid_index     11      0       0       10      403     0       1664    2       1       0       0       0       f       t       p       i       1       0       f       f       f       f       f       0       0       \N      \N
pg_attribute_relid_attnam_index 11      0       0       10      403     0       0       16      2265    0       0       0       f       f       p       i       2       0       f       f       f       f       f       0       0       \N      \N
pg_attribute_relid_attnum_index 11      0       0       10      403     0       0       11      2265    0       0       0       f       f       p       i       2       0       f       f       f       f       f       0       0       \N      \N
pg_toast_1255   99      11046   0       10      0       0       0       0       0       0       0       2837    t       f       p       t       3       0       f       f       f       f       f       712     1       \N      \N

Nice. Of course, the way I ran it – i.e. COPY sql command, it is being executed on database server, using the shell account that Postgres daemon is running from. Thanks to \copy in psql, we can use it for local program access, like this:

$ CREATE TABLE z (i int4);
CREATE TABLE
 
$ \copy z FROM program 'perl -le "print int(1000 * rand()) for 1..10"'
 
$ SELECT * FROM z;
  i
-----
  21
 817
 526
 892
 349
 332
  21
 248
 167
 275
(10 ROWS)

Of course, using it to generate random data is pretty much pointless, but I can imagine using it for loading data that is stored on remote server, and thus bypassing need to download it and store locally. Something along the lines of:

$ CREATE TABLE words (id int4 PRIMARY KEY, word text);
CREATE TABLE
 
$ \copy words FROM program 'wget -q -O - https://www.depesz.com/various/words.csv'
 
$ SELECT COUNT(*) FROM words;
 COUNT
-------
 72188
(1 ROW)

With larger datasets it will be better to run it using “COPY" and not “\copy" – to decrease number of times data has to be transferred.

It's actually pretty cool thing, thanks Etsuro.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.