Waiting for PostgreSQL 17 – Add new COPY option LOG_VERBOSITY.

On 1st of April 2024, Masahiko Sawada committed patch:

Add new COPY option LOG_VERBOSITY.
 
This commit adds a new COPY option LOG_VERBOSITY, which controls the
amount of messages emitted during processing. Valid values are
'default' and 'verbose'.
 
This is currently used in COPY FROM when ON_ERROR option is set to
ignore. If 'verbose' is specified, a NOTICE message is emitted for
each discarded row, providing additional information such as line
number, column name, and the malformed value. This helps users to
identify problematic rows that failed to load.
 
Author: Bharath Rupireddy
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/CALj2ACUk700cYhx1ATRQyRw-fBM%2BaRo6auRAitKGff7XNmYfqQ%40mail.gmail.com

Continue reading Waiting for PostgreSQL 17 – Add new COPY option LOG_VERBOSITY.

Waiting for PostgreSQL 17 – Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR

On 16th of January 2024, Alexander Korotkov committed patch:

Add new COPY option SAVE_ERROR_TO
 
Currently, when source data contains unexpected data regarding data type or
range, the entire COPY fails. However, in some cases, such data can be ignored
and just copying normal data is preferable.
 
This commit adds a new option SAVE_ERROR_TO, which specifies where to save the
error information. When this option is specified, COPY skips soft errors and
continues copying.
 
Currently, SAVE_ERROR_TO only supports "none". This indicates error information
is not saved and COPY just skips the unexpected data and continues running.
 
Later works are expected to add more choices, such as 'log' and 'table'.
 
Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He
Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com
Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson,
Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada
Reviewed-by: Vignesh C, Atsushi Torikoshi

and then, three days later, he changed the syntax in next patch:

Rename COPY option from SAVE_ERROR_TO to ON_ERROR
 
The option names now are "stop" (default) and "ignore".  The future options
could be "file 'filename.log'" and "table 'tablename'".
 
Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com
Author: Jian He
Reviewed-by: Atsushi Torikoshi

Continue reading Waiting for PostgreSQL 17 – Add new COPY option SAVE_ERROR_TO / Rename COPY option from SAVE_ERROR_TO to ON_ERROR

Waiting for PostgreSQL 15 – Add HEADER support to COPY text format

On 28th of January 2022, Peter Eisentraut committed patch:

Add HEADER support to COPY text format 
 
The COPY CSV format supports the HEADER option to output a header
line.  This patch adds the same option to the default text format.  On
input, the HEADER option causes the first line to be skipped, same as
with CSV.
 
Author: Rémi Lapeyre <remi.lapeyre@lenstra.fr>
Discussion: https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com

Continue reading Waiting for PostgreSQL 15 – Add HEADER support to COPY text format

Waiting for PostgreSQL 14 – Report progress of COPY commands

On 6th of January 2021, Tomas Vondra committed patch:

Report progress of COPY commands
 
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands.  This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).
 
Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Discussion: https://postgr.es/m/CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Discussion: https://postgr.es/m/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com

Continue reading Waiting for PostgreSQL 14 – Report progress of COPY commands

Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions

On 19th of January 2019, Tomas Vondra committed patch:

Allow COPY FROM to filter data using WHERE conditions
 
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.).  Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.
 
Author: Surafel Temesgen
 
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com

Continue reading Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions

Waiting for PostgreSQL 12 – Allow multi-inserts during COPY into a partitioned table

On 1st of August 2018, Peter Eisentraut committed patch:

Allow multi-inserts during COPY into a partitioned table
 
 
CopyFrom allows multi-inserts to be used for non-partitioned tables, but
this was disabled for partitioned tables.  The reason for this appeared
to be that the tuple may not belong to the same partition as the
previous tuple did.  Not allowing multi-inserts here greatly slowed down
imports into partitioned tables.  These could take twice as long as a
copy to an equivalent non-partitioned table.  It seems wise to do
something about this, so this change allows the multi-inserts by
flushing the so-far inserted tuples to the partition when the next tuple
does not belong to the same partition, or when the buffer fills.  This
improves performance when the next tuple in the stream commonly belongs
to the same partition as the previous tuple.
 
In cases where the target partition changes on every tuple, using
multi-inserts slightly slows the performance.  To get around this we
track the average size of the batches that have been inserted and
adaptively enable or disable multi-inserts based on the size of the
batch.  Some testing was done and the regression only seems to exist
when the average size of the insert batch is close to 1, so let's just
enable multi-inserts when the average size is at least 1.3.  More
performance testing might reveal a better number for, this, but since
the slowdown was only 1-2% it does not seem critical enough to spend too
much time calculating it.  In any case it may depend on other factors
rather than just the size of the batch.
 
Allowing multi-inserts for partitions required a bit of work around the
per-tuple memory contexts as we must flush the tuples when the next
tuple does not belong the same partition.  In which case there is no
good time to reset the per-tuple context, as we've already built the new
tuple by this time.  In order to work around this we maintain two
per-tuple contexts and just switch between them every time the partition
changes and reset the old one.  This does mean that the first of each
batch of tuples is not allocated in the same memory context as the
others, but that does not matter since we only reset the context once
the previous batch has been inserted.
 
Author: David Rowley <david.rowley@2ndquadrant.com>

Continue reading Waiting for PostgreSQL 12 – Allow multi-inserts during COPY into a partitioned table

Waiting for PostgreSQL 10 – Logical replication support for initial data copy

On 23rd of March 2017, Peter Eisentraut committed patch:

Logical replication support for initial data copy
 
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
 
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function.  The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
 
A WAL receiver can now execute full SQL commands.  This is used here to
obtain information about tables and publications.
 
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
 
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
 
Author: Petr Jelinek

Continue reading Waiting for PostgreSQL 10 – Logical replication support for initial data copy

Waiting for 9.3 – Add support for piping COPY to/from an external program.

On 27th of February, Heikki Linnakangas committed patch:

Add support for piping COPY to/from an external program.
 
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
 
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
 
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
 
Etsuro Fujita, reviewed by Amit Kapila.

Continue reading Waiting for 9.3 – Add support for piping COPY to/from an external program.