November 18th, 2009 by depesz | Tags: , , , | 3 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 14th of October Tome Lane committed new patch from Itagaki Tahahiro:

Log Message:
-----------
Support SQL-compliant triggers on columns, ie fire only if certain columns
are named in the UPDATE's SET list.
 
Note: the schema of pg_trigger has not actually changed; we've just started
to use a column that was there all along. catversion bumped anyway so that
this commit is included in the history of potentially interesting changes
to system catalog contents.
 
Itagaki Takahiro

Up to this moment, when you write a trigger on update (on INSERT and DELETE the COLUMN based triggers are obviously useless), it is ran whenever you issue update to any row (or no row at all!) in table.

But sometimes – you might want to run some trigger only when given field changes – for example: change tsvector column for TSearch2 when title or body changes, but not when validity status changes, or number of views – in this situation we'd want to skip running tsvector changes when title and body are the same because recalculation of tsvector is quite expensive. In other case it might be that trigger on update creates new update query to another table, and we want not to call it when it's not needed.

Let's see how it looks. Test table:

# \d test
Table "public.test"
+-----------+----------+---------------------------------------------------+
| Column | Type | Modifiers |
+-----------+----------+---------------------------------------------------+
| id | integer | not null default nextval('test_id_seq'::regclass) |
| views | integer | not null default 0 |
| title | text | |
| body | text | |
| is_active | boolean | not null default false |
| fts_data | tsvector | |
+-----------+----------+---------------------------------------------------+
Indexes:
"test_pkey" PRIMARY KEY, btree (id)

So, in 8.4, trigger to update ftp_data would look like this:

CREATE OR REPLACE FUNCTION trg_test_i() RETURNS TRIGGER AS $$
BEGIN
NEW.fts_data := to_tsvector( coalesce( NEW.title, '' ) || ' ' || coalesce( NEW.body || '' ) );
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER trg_test_i BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE trg_test_i();

This is obligatory trigger on INSERT, which will be the same. And trigger on UPDATE would look like this:

CREATE OR REPLACE FUNCTION trg_test_u() RETURNS TRIGGER AS $$
BEGIN
IF NEW.title IS DISTINCT FROM OLD.title OR NEW.body IS DISTINCT FROM OLD.BODY THEN
NEW.fts_data := to_tsvector( coalesce( NEW.title, '' ) || ' ' || coalesce( NEW.body || '' ) );
END IF;
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER trg_test_u BEFORE UPDATE ON test FOR EACH ROW EXECUTE PROCEDURE trg_test_u();

This of course work nicely, but has several issues:

  • it's easy to forget to add IF
  • it becomes more and more complex in case we have several such relations on single table
  • while easy to read for seasoned dba, the IF will definitely raise questions from new DBAs

Now, luckily it is no longer needed. Instead of the trigger above I can just:

CREATE OR REPLACE FUNCTION trg_test_u() RETURNS TRIGGER AS $$
BEGIN
NEW.fts_data := to_tsvector( coalesce( NEW.title, '' ) || ' ' || coalesce( NEW.body || '' ) );
RETURN NEW;
END;
$$ language plpgsql;
CREATE TRIGGER trg_test_u BEFORE UPDATE OF title, body ON test FOR EACH ROW EXECUTE PROCEDURE trg_test_u();

Which is the same code as in trg_test_i() function.

Also – at least theoretically – new approach should be faster. Let's see if it really is that way.

So, let's test:

CREATE TABLE test (
id serial PRIMARY KEY,
irrelevant INT4,
relevant INT4,
updates INT4 DEFAULT 0
);

Now, let's insert some dummy data:

# INSERT INTO test (id, irrelevant, relevant, updates) SELECT i, i, i, 0 FROM generate_series(1,100000) i;
INSERT 0 100000

And let's write test sql script:

perl -e 'printf("UPDATE test SET %s = %s + 1 WHERE id = %i;\n",$_%10?"irrelevant":"relevant",$_%10?"irrelevant":"relevant", $_) for (1..100000)' > testit.sql

File looks like this:

=$ head -n 21 testit.sql
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 1;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 2;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 3;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 4;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 5;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 6;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 7;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 8;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 9;
UPDATE test SET relevant = relevant + 1 WHERE id = 10;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 11;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 12;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 13;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 14;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 15;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 16;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 17;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 18;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 19;
UPDATE test SET relevant = relevant + 1 WHERE id = 20;
UPDATE test SET irrelevant = irrelevant + 1 WHERE id = 21;

OK. So, first let's create “old-fashioned" trigger:

CREATE OR REPLACE FUNCTION test_trigger() RETURNS TRIGGER AS
$BODY$
BEGIN
IF NEW.relevant IS distinct FROM OLD.relevant THEN
NEW.updates := NEW.updates + 1;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql';
CREATE TRIGGER test_trigger BEFORE UPDATE ON test FOR EACH ROW EXECUTE PROCEDURE test_trigger();

And now let's test the testit.sql (100k updates, 10% of which are further processed with very simple plpgsql code). I ran the test 5 times, each time cleaning first the database. Times:

real 0m39.923s
real 0m39.122s
real 0m38.569s
real 0m38.701s
real 0m38.225s

And how it is with new, “ON COLUMN" trigger:

real 0m38.842s
real 0m38.639s
real 0m36.495s
real 0m36.478s
real 0m37.522s

Pretty similar new way is a bit faster ( 38.908 vs. 37.595 seconds on average ), so performance will not be important factor when choosing this new approach – but readability of code definitely will.

Cool stuff.

  1. 3 comments

  2. so maybe I’ve read this too fast and it was answered… (I think your code probably answers but…)

    Support SQL-compliant triggers on columns, ie fire only if certain columns
    are named in the UPDATE’s SET list.

    so it won’t fire if I don’t name the column? or just generally do an insert on that column even if I name it?

  3. Nov 18, 2009

    @Caleb: according to PostgreSQL 8.5devel Docs, no. it won’t fire if you don’t name the column.

    in particular, this means that a trigger defined as
    CREATE TRIGGER trigger1
    BEFORE UPDATE OF column1 ON table1
    FOR EACH ROW EXECUTE PROCEDURE trigger1();

    will *not* fire when column1 is updated by another trigger.

    for me it’s a bit weird but probably it’s standards-compliant, if pg-gods decided this way.

  4. Feb 1, 2010

    Good fill someone in on and this post helped me alot in my college assignement. Thank you as your information.

Leave a comment