Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions

On 19th of January 2019, Tomas Vondra committed patch:

Allow COPY FROM to filter data using WHERE conditions
 
Extends the COPY FROM command with a WHERE condition, which allows doing
various types of filtering while importing the data (random sampling,
condition on a data column, etc.).  Until now such filtering required
either preprocessing of the input data, or importing all data and then
filtering in the database. COPY FROM ... WHERE is an easy-to-use and
low-overhead alternative for most simple cases.
 
Author: Surafel Temesgen
 
Discussion: https://www.postgresql.org/message-id/flat/CALAY4q_DdpWDuB5-Zyi-oTtO2uSk8pmy+dupiRe3AvAc++1imA@mail.gmail.com

I assume most of you are aware of COPY command that can be used to quickly load or dump data to/from files.

Since 9.3 we also have ability to copy to/from pipes.

And now, we got one more thing – filtering of input.

When outputting data from database, you could have always use syntax like:

=$ copy ( SELECT a, b, c FROM d WHERE e = f ) TO '/tmp/whatever';

but when loading data into pg, it wasn't so simple. Sure, you could have filter the data before, or use grep as your input program, but now, we can do something much more nicer – apply where clause on loaded data!

Let's see how that would work. First, obviously, I need some table:

=$ CREATE TABLE sysusers (
    username text PRIMARY KEY,
    password text,
    user_id int4 NOT NULL UNIQUE,
    group_id int4 NOT NULL,
    gecos text,
    homedir text,
    shell text
);

now, I can copy data in there:

=$ copy sysusers FROM '/etc/passwd' WITH delimiter ':';
COPY 42

I loaded 42 users. But, let's assume I just want real system users, and my definition is: with user_id less than 1000:

=$ TRUNCATE sysusers;
TRUNCATE TABLE
 
=$ copy sysusers FROM '/etc/passwd' WITH ( delimiter ':' ) WHERE user_id < 1000;
COPY 40

The difference is not big, but it's clearly there. And where condition is applied to already parsed data using normal semantics for their datatypes.

This is pretty cool. Thanks to all involved 🙂

5 thoughts on “Waiting for PostgreSQL 12 – Allow COPY FROM to filter data using WHERE conditions”

  1. Does this fully support arbitrary expressions, including things like subselects so you use conditions that refer to other tables? If so, does that even include the same table? (Which could be particularly useful… `COPY foo FROM STDIN WHERE id NOT IN (SELECT id FROM foo)`)

  2. @Aristotle:

    No, you can’t have subqueries:

    =$ copy sysusers FROM '/etc/passwd' WITH ( delimiter ':' ) WHERE user_id NOT IN (SELECT user_id FROM sysusers);
    ERROR:  cannot USE subquery IN COPY FROM WHERE condition
    LINE 1: .../etc/passwd' WITH ( delimiter ':' ) WHERE user_id not in (se...
                                                                 ^
  3. Aww. So it doesn’t give you any more power than preprocessing the input. It’s of course nicer syntax than using a pipe from grep, and it’s more portable across host OSes compared to depending on (a specific version of) grep – but it’s no more powerful. That’s a pity.

    Not that I’m complaining! Maybe this patch just lays the groundwork, and the extra power will come in some future patch. Or maybe not – even just this limited version is a nice feature.

  4. Yes, I don’t have the heart to do ask for this feature.
    We daily import data from telephone switchboard and delete the data we don’t want. With a WHERE-Clause we don’t must import the unwanted data …

  5. I wonder if implementing statement-level instead-of triggers (not currently available in v11) would give you the flexibility to write what you want, albeit wrapped in a trigger.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.