On 6th of January 2021, Tomas Vondra committed patch:
Report progress of COPY commands
This commit introduces a view pg_stat_progress_copy, reporting progress
of COPY commands. This allows rough estimates how far a running COPY
progressed, with the caveat that the total number of bytes may not be
available in some cases (e.g. when the input comes from the client).
Author: Josef Šimánek
Reviewed-by: Fujii Masao, Bharath Rupireddy, Vignesh C, Matthias van de Meent
Continue reading Waiting for PostgreSQL 14 – Report progress of COPY commands
On 19th of January 2019, Tomas Vondra committed patch:
Allow COPY FROM to filter data using WHERE conditions
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.). Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.
Author: Surafel Temesgen
Continue reading Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions
On 1st of August 2018, Peter Eisentraut committed patch:
Allow multi-inserts during COPY into a partitioned table
CopyFrom allows multi-inserts to be used for non-partitioned tables, but
this was disabled for partitioned tables. The reason for this appeared
to be that the tuple may not belong to the same partition as the
previous tuple did. Not allowing multi-inserts here greatly slowed down
imports into partitioned tables. These could take twice as long as a
copy to an equivalent non-partitioned table. It seems wise to do
something about this, so this change allows the multi-inserts by
flushing the so-far inserted tuples to the partition when the next tuple
does not belong to the same partition, or when the buffer fills. This
improves performance when the next tuple in the stream commonly belongs
to the same partition as the previous tuple.
In cases where the target partition changes on every tuple, using
multi-inserts slightly slows the performance. To get around this we
track the average size of the batches that have been inserted and
adaptively enable or disable multi-inserts based on the size of the
batch. Some testing was done and the regression only seems to exist
when the average size of the insert batch is close to 1, so let's just
enable multi-inserts when the average size is at least 1.3. More
performance testing might reveal a better number for, this, but since
the slowdown was only 1-2% it does not seem critical enough to spend too
much time calculating it. In any case it may depend on other factors
rather than just the size of the batch.
Allowing multi-inserts for partitions required a bit of work around the
per-tuple memory contexts as we must flush the tuples when the next
tuple does not belong the same partition. In which case there is no
good time to reset the per-tuple context, as we've already built the new
tuple by this time. In order to work around this we maintain two
per-tuple contexts and just switch between them every time the partition
changes and reset the old one. This does mean that the first of each
batch of tuples is not allocated in the same memory context as the
others, but that does not matter since we only reset the context once
the previous batch has been inserted.
Author: David Rowley <email@example.com>
Continue reading Waiting for PostgreSQL 12 – Allow multi-inserts during COPY into a partitioned table
On 23rd of March 2017, Peter Eisentraut committed patch:
Logical replication support for initial data copy
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek
Continue reading Waiting for PostgreSQL 10 – Logical replication support for initial data copy
On 27th of February, Heikki Linnakangas committed patch:
Add support for piping COPY to/from an external program.
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
Etsuro Fujita, reviewed by Amit Kapila.
Continue reading Waiting for 9.3 – Add support for piping COPY to/from an external program.