Waiting for PostgreSQL 17 – Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR

On 16th of January 2024, Alexander Korotkov committed patch:

Add new COPY option SAVE_ERROR_TO
 
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
 
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
 
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
 
Later works are expected to add more choices, such as 'log' and 'table'.
 
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi

and then, three days later, he changed the syntax in next patch:

Rename COPY option from SAVE_ERROR_TO to ON_ERROR
 
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".
 
Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi

The ability to make COPY ignore errors is, in my memory, in top five most commonly requested missing features.

Let's see how it behaved, and how it behaves now.

For test I will need faulty input file, which I can produce using:

$ printf 'a\t1\nb\nc\t\nd\t4\n' > copy.data
 
$ cat copy.data
a   1
b
c
d   4

Now, simple table:

=$ CREATE TABLE copytest (FIRST text, SECOND int4);
CREATE TABLE

Loading in previous versions was failing like this:

=$ copy copytest FROM '/tmp/copy.data';
ERROR:  missing DATA FOR COLUMN "second"
CONTEXT:  COPY copytest, line 2: "b"
 
=$ SELECT * FROM copytest ;
 FIRSTSECOND 
───────┼────────
(0 ROWS)

And while testing in Pg17, I got:

=$ copy copytest FROM '/tmp/copy.data' ( on_error IGNORE );
ERROR:  missing DATA FOR COLUMN "second"
CONTEXT:  COPY copytest, line 2: "b"

WHAT ?!

After reading some more, and asking people smarter than me, I understood. In docs it says: “error converting a column's input value into its data type". Converting. The value has to be there, but can be invalid. Let's retry:

$ printf 'a\t1\nb\tz\nc\t\nd\t4\n' | tee /tmp/copy.data
a       1
b       z
c
d       4

and, now, let's retry:

=$ copy copytest FROM '/tmp/copy.data' ( on_error IGNORE );
NOTICE:  2 ROWS were skipped due TO DATA TYPE incompatibility
COPY 2
 
=$ SELECT * FROM copytest ;
 FIRSTSECOND
───────┼────────
 a     │      1
 d     │      4
(2 ROWS)

Nice. It would be even nicer if copy could also ignore malformed lines, but I guess it's first step.

Anyway – it's a great thing to add. Thanks to everyone involved.

3 thoughts on “Waiting for PostgreSQL 17 – Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR”

  1. @jian:

    that’s great. it hasn’t been submitted yet, though, as far as I can see? or did I miss something?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.