November 14th, 2012 by depesz | Tags: , , , , , | 13 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

Some people are afraid of triggers.

Reasons for this are not really understandable for me, but I guess it stems from the fact that these are usually application developers, and not database admins. Or they encountered some kind of problem with triggers, and now they tend to think that triggers are inherently evil.

But they are not.

As virtually anything, triggers have some benefits, and some drawbacks. With a bit of thinking you can use them to do really cool things. But first you have to understand what exactly trigger is, how it works, and when to use which kind.

A trigger is a function that gets automatically called in case of database event. What kind of event? Currently these are:

  • INSERT INTO
  • UPDATE
  • DELETE FROM
  • TRUNCATE (this one can be used for triggers only since PostgreSQL 8.4)

In upcoming 9.3 release there will be more types of triggers, but their usage is a bit more complicated, so I'll not cover them for now.

These triggers can be defined on every user table (you can't create triggers on system tables), and (since 9.1) also on views.

For now, I will focus on basic triggers on tables, as view triggers are a bit different. I will also skip conditional triggers, and triggers set on subset of fields.

Psqls “\h CREATE TRIGGER" shows:

Command:     CREATE TRIGGER
Description: define a new trigger
Syntax:
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
    ON table_name
    [ FROM referenced_table_name ]
    { NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } }
    [ FOR [ EACH ] { ROW | STATEMENT } ]
    [ WHEN ( condition ) ]
    EXECUTE PROCEDURE function_name ( arguments )
 
where event can be one of:
 
    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

That's quite a lot. But don't worry. There are, for now, just couple of types of triggers:

  • BEFORE or AFTER
  • FOR EACH ROW or STATEMENT

( side note: triggers on TRUNCATE cannot be made “FOR EACH ROW" – they can be only per STATEMENT ).

I'll start, conveniently from the end. What's the difference in “for each row" vs. “for each statement"?

That's pretty simple, and straight forward.

When you have multi-row operation, for example: delete from table; (without where), “for each row" trigger, will be called when deleting every single row. While “for each statement" will be ran only once.

That's not the only difference though.

When running “FOR EACH ROW" trigger, you have (in function) special variables which contain exact content of row that is being inserted/updated/deleted. So in the function you can do something with it. On the other hand – since FOR EACH STATEMENT triggers can work on millions of rows at once – they do not get information what rows were modified.

You might think that it is stupid – after all – what's the point of trigger if you don't know what got modified. But actually it has some serious uses, and the limitation that “statement trigger doesn't know what was modified" can be worked around. Truth must be told though, that in real life, I see much more row-level triggers than statement based ones.

The other important classification of trigger – AFTER/BEFORE.

Let's consider it in case of UPDATE. You're running an update. PostgreSQL has to check that all values you provided are sensible, match all constraints (not null, checks, unique), then it actually updates the row (well, technically it doesn't because PostgreSQL has mvcc, but for now let's pretend that it does).

BEFORE trigger is called just before the internal row representation is updated. AFTER trigger is called afterwards.

What does it tell us?

For starters – if you want to modify new values inserted/updated to the table – the correct place to do it is in BEFORE trigger.

I have seen code, where people were adding AFTER INSERT trigger, which issued UPDATE of newly inserted record. That just doesn't make sense. It bloats the table, and is slower. In BEFORE trigger, you can modify the row data before it will be stored in table.

To have a place for experiments, I'll make a simple users table:

create table users (
    id serial primary key,
    username text not null unique,
    fullname text,
    created_on timestamptz not null default now(),
    last_modified timestamptz
);

And now we can experiment on it. First – let's assume that you'd want fullnames to be normalized to lowercase, with uppercased first letter of every word (just in case: this is bad idea, and don't do it in real life code).

For this, the simple solution is to create a trigger that will modify fullname of all records – on INSERT, and on UPDATE.

Before I can write it, we need couple of information:

  • trigger calls a function (i.e. logic is not in trigger, but it's in function (which can be shared between multiple triggers, on multiple tables)
  • such function should be defined not to have any arguments and return pseudotype TRIGGER
  • in case of FOR EACH ROW triggers, whole rows are available as either NEW or OLD record variables (in case of insert – NEW, in case of DELETE – OLD, in case of UPDATE – both, OLD one is state of row before update, NEW is state of row after update)
  • there are some more special variables, which are listed in fine manual

If you're reading carefully you might be confused that I wrote that such function doesn't have arguments, but in psql \h output above it shows:

EXECUTE PROCEDURE function_name ( arguments )

which suggests that arguments are possible.

That's right – you can have arguments to trigger, but they are not passed as arguments to function. Instead they are passed using special variable TG_ARGV, and its compantion TG_NARGS.

Knowing all of this we can write our fullname-normalizing function:

1
2
3
4
5
6
CREATE FUNCTION normalize_fullname() RETURNS trigger AS $$
BEGIN
    NEW.fullname := initcap( NEW.fullname );
    RETURN NEW;
END;
$$ language plpgsql;

In case you're not familiar with functions in PostgreSQL:

  • line 1 – begins “CREATE FUNCTION" statement, including information that the function will be named normalize_fullname, will not have any arguments “()", and it returns value of pseudotype trigger.
  • line 2 – every function in plpgsql (check line 6) has to have BEGIN -> END; block, which is its body
  • line 3 – NEW is record which contains data for new row. NEW.fullname is value of fullname column in this record. initcap function takes string, and returns modified version, so it has to be assigned to NEW.fullname again to make the change actually happen
  • line 4 – function has to return something, and the returned value is used to insert/update the data.
  • line 5 – end of function body
  • line 6 – declaration that the function is written in “plpgsql" language (one of many possible pl/* languages)

With the function in place, I can actually create a trigger:

1
2
3
4
5
CREATE TRIGGER normalize_fullname_trg
    BEFORE INSERT OR UPDATE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE normalize_fullname();

and explanation:

  • line 1 – beginning of CREATE TRIGGER statement. Each trigger has its own name, which can, but doesn't have to be, like executed function
  • line 2 – marks the trigger as BEFORE, and sets it to be ran on INSERT and UPDATE events (we don't care about delete when normalizing data, and TRUNCATE is different anyway)
  • line 3 – sets the table the trigger is created on
  • line 4 – marks the trigger as per ROW, and not per STATEMENT
  • line 5 – names the function that has to be called whenever something happens

Does that work? Let's see:

$ insert into users ( username, fullname ) values ( 'depesz', 'hubert lubaczewski' );
INSERT 0 1
 
$ select * from users;
 id | username |      fullname      |          created_on           | last_modified
----+----------+--------------------+-------------------------------+---------------
  1 | depesz   | Hubert Lubaczewski | 2012-11-13 14:58:47.500289+01 | [null]
(1 row)

and what about update?

$ update users set fullname = 'ABSOLUTELY IRRELEVANT NAME';
UPDATE 1
 
$ select * from users;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [null]
(1 row)

Looks like working.

This looks easy, I hope. Now for a bit more advanced topics.

I mentioned that function returns NEW because it's the row that has to be set in table, but that's not always the case. You can also return NULL value, which will do something really sneaky – it will not do the insert/update/delete, but it will also not raise any exception/error. It will just silently ignore your command.

Let's see simple example:

$ CREATE FUNCTION we_dont_like_them() returns trigger as $$
BEGIN
    IF NEW.fullname ilike '%mysql%' THEN
        RETURN NULL;
    ELSIF NEW.username ilike '%mysql%' THEN
        RETURN NULL;
    END IF;
    RETURN NEW;
END;
$$ language plpgsql;
 
$ CREATE TRIGGER we_dont_like_them_trg
    BEFORE INSERT OR UPDATE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE we_dont_like_them();
 
$ insert into users (username, fullname) values
    ( 'test1', 'mysql' ),
    ( 'test2', 'MYSQL' ),
    ( 'mysql1', 'test3' ),
    ( 'MySQL2', 'test4' ),
    ( 'mssql', 'oops' );
 
$ select * from users;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [null]
  6 | mssql    | Oops                       | 2012-11-13 15:19:03.42297+01  | [null]
(2 rows)

As you can see all users which mention MySQL got silently skipped.

You can also use triggers to do some kind of more complex validation. And if given user fails – use RAISE EXCEPTION to break the transaction. But actually – these kind of things should be rather done in CHECK constraints, and not triggers.

All of what I showed above are BEFORE-PER-ROW trigger. So what good are other triggers for?

AFTER triggers are meant to be place where you put logic that influences other tables when the modifiation of your original table succeds.

For example, let's assume we have also privileges table:

CREATE TABLE privileges (
    user_id int4 primary key references users (id),
    privileges text[] not null
);

And, for whatever reason, we want a row to exist for every user in privileges table, with basic ‘none' privilege.

To do it, I can:

$ CREATE FUNCTION make_default_privileges() returns trigger as $$
begin
    insert into privileges (user_id, privileges) values ( NEW.id, ARRAY[ 'none' ]::text[] );
    return NEW;
end;
$$ language plpgsql;
 
$ CREATE TRIGGER make_default_privileges_trg
    AFTER INSERT
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE make_default_privileges();
 
$ insert into users (username) values ('guest') returning *;
 id | username | fullname |          created_on           | last_modified
----+----------+----------+-------------------------------+---------------
  7 | guest    | [null]   | 2012-11-13 15:26:45.790506+01 | [null]
(1 row)
 
$ select * from privileges;
 user_id | privileges
---------+------------
       7 | {none}
(1 row)

Of course you might say: “OK, but I can do it in BEFORE trigger too, right?".

Well, right. You can. At least kind-of. NEW.id is already assigned. But you shouldn't.

There are two reasons. First one – foreign key. If you'll insert to privileges from BEFORE INSERT trigger, foreign key on privileges will complain because from its point of view, the row in users does not exist yet.

But even without foreign key – you can have many triggers on a row. Some trigger might reject the row, silently or with exception. In such case – we shouldn't insert to another table – not because the data will stay there – as it will not, but because it's unnecessary work.

Hence – if you have to issue insert/update/delete from trigger – it should almost always be AFTER trigger.

As you perhaps noticed, I did ‘return NEW' in the function. If you are inquisitive, you might wonder: if I'll return NULL, will it also ignore the row? No. In AFTER trigger, returned value doesn't matter at all. I usually do return NEW, but I could do return NULL as well – the row is already stored in table, so I can't “ignore" it afterwards.

This brings me to one of the most commonly asked questions about triggers (at least commonly to me): if the insert will fail, will the effects of trigger stay in db.

Long story short – no. Even if you don't use transactions explicitly, each query sent to PostgreSQL is in implicit, single-query, transaction. And command and its triggers are always within the same transaction. So if trigger will raise exception – it will also remove the changes that command applied. If one of the triggers will raise exception – effects of all triggers, ran by this command, will be rolled back.

This is great, because it gives you guarantee that whatever you do in trigger, will always be applied only with the data change from application.

When you have application-level triggers (in ORM, or whatever) – you can, by mistake, run two related modifications in separate transaction, and accidentaly have one rolled back, but the other not – thus ending up with “mixed" state of database.

When the logic is in trigger – all changes, are treated as atomic – and you either have all of them, or none.

Back to explaining.

One of the most common tasks for triggers (at least in my environment) is forbiding changes of particular data, or forcing change of given columns.

For example – let's look at the users table:

                                     Table "public.users"
    Column     |           Type           |                     Modifiers
---------------+--------------------------+----------------------------------------------------
 id            | integer                  | not null default nextval('users_id_seq'::regclass)
 username      | text                     | not null
 fullname      | text                     |
 created_on    | timestamp with time zone | not null default now()
 last_modified | timestamp with time zone |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_username_key" UNIQUE CONSTRAINT, btree (username)
Referenced by:
    TABLE "privileges" CONSTRAINT "privileges_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
Triggers:
    make_default_privileges_trg AFTER INSERT ON users FOR EACH ROW EXECUTE PROCEDURE make_default_privileges()
    normalize_fullname_trg BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE normalize_fullname()
    we_dont_like_them_trg BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE we_dont_like_them()

created_on column – ideally it shouldn't be possible to specify anything other than “now", but default value doesn't protect us. I can easily create user with fake created_on:

$ insert into users (username, created_on) values ( 'bad santa', '0001-01-01 00:00:00' );
 
$ select * from users where username = 'bad santa';
 id | username  | fullname |        created_on         | last_modified
----+-----------+----------+---------------------------+---------------
  8 | bad santa | [null]   | 0001-01-01 00:00:00+01:24 | [null]
(1 row)

What's more – after the row is in there, I can update the column – which also is not good idea.

On the other hand – last_modified should be set to now() on every update, even if I forget to set it in update statement. And it shouldn't be set to anything on insert.

This is of course also task for triggers. Based on what you learned earlier, you might understand that it should be BEFORE, but should it be one trigger (on both INSERT and UPDATE), or two separate triggers (because the logic is different in those cases).

Doing it in two triggers is trivial, but doing it in one function will let me show you some more cool stuff – namely TG_OP variable (described in the manual):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ CREATE FUNCTION sanitize_dates() RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        NEW.created_on := now();
        NEW.last_modified := NULL;
    ELSIF TG_OP = 'UPDATE' THEN
        NEW.created_on := OLD.created_on;
        NEW.last_modified := now();
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
$ CREATE TRIGGER sanitize_dates_trg
    BEFORE INSERT OR UPDATE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE sanitize_dates();

Interesting bit are the IFs in the function.

  • First, in lines 3-5, function checks if it was called for INSERT, and if so – sets created_on for now() and last_modified to NULL. Thus – if user supplied any information there, it will be ignored.
  • Second, in lines 6-8, function checks if the command was UPDATE, and if yes – copies created_on from OLD version of the record to NEW data – hence, if the user tried to set new value of created_on – it will be overwritten. And then it sets last_modified to now().

And how does it work?

$ SELECT now();
              now
-------------------------------
 2012-11-13 15:50:13.750914+01
(1 row)
 
$ insert into users (username, created_on, last_modified) values ( 'paranoid parrot', '0001-01-01 00:00:00', '1410-07-15 07:00:00' );
 
$ select * from users where username = 'paranoid parrot';
 id |    username     | fullname |          created_on           | last_modified
----+-----------------+----------+-------------------------------+---------------
  9 | paranoid parrot | [null]   | 2012-11-13 15:50:13.751598+01 | [null]
(1 row)
 
$ update users set created_on = '0001-01-01 00:00:00', last_modified = '1410-07-15 07:00:00' where username = 'paranoid parrot';
 
$ select * from users where username = 'paranoid parrot';
 id |    username     | fullname |          created_on           |         last_modified
----+-----------------+----------+-------------------------------+-------------------------------
  9 | paranoid parrot | [null]   | 2012-11-13 15:50:13.751598+01 | 2012-11-13 15:50:13.765139+01
(1 row)

Sweet. Of course – triggers can be disabled, but it requires specific privileges, so that's not really relevant.

So, I covered BEFORE and AFTER, for inserts, and updates. What about DELETE?

Let's make archive table:

create table users_archive (
    id int4 primary key,
    username text not null unique,
    fullname text,
    created_on timestamptz not null default now(),
    last_modified timestamptz,
    deleted_on timestamptz
);

This table is almost the same like users, but it is not linked to sequence for id column, and it has one more column – deleted_on.

With this I can:

CREATE FUNCTION archive_users() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO users_archive (id, username, fullname, created_on, last_modified, deleted_on)
        VALUES (OLD.id, OLD.username, OLD.fullname, OLD.created_on, OLD.last_modified, now());
    RETURN OLD;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER archive_users_trg
    AFTER DELETE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE archive_users();
 
$ delete from users where id = 1 returning *;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [null]
(1 row)
 
$ select * from users_archive;
 id | username |          fullname          |          created_on           | last_modified |          deleted_on
----+----------+----------------------------+-------------------------------+---------------+-------------------------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [null]        | 2012-11-13 15:58:15.001053+01
(1 row)

With this, I have example for all kinds of FOR EACH ROW triggers. But what about STATEMENT? The thing is that when using them you don't actually know what rows were inserted/updated/deleted. So, this might be useful, for example, to regenerate some kind of materialized view that you're OK to redo after every change to your base table.

But, there is one “interesting" trick you can do.

Let's assume you want to store, in side table, total count of all rows in a table. So that your “select count(*) from table" can be modified to: “select count from counts where table_name = ‘table'".

If we'd use standard “FOR EACH ROW" triggers, this could lead to relatively important problems.

Let's consider case, when, on INSERT you increment the counter in side table, on DELETE you decrement it. Trivial, right? Two triggers (or one with IF), “AFTER INSERT" and “AFTER DELETE".

But what would happen if I did: INSERT INTO table values ()….. – for 500,000 rows? Or delete of a million of rows?

Well, the counts table would get updated 500,000 times, or a million times. Does it sound like a good idea? Hope you said: “NO!".

Thanks to STATEMENT triggers we can do something smarter. For example: have “a counter" somewhere in memory, and make ROW triggers increment it (without any write overhead, and make AFTER STATEMENT trigger do actual count update.

How would that work? First we need the counts table:

$ create table counts (
    table_name text primary key,
    count int4 not null
);

Then, I will store initial row:

$ insert into counts (table_name, count) select 'users', count(*) from users;
 
$ select * from counts;
 table_name | count
------------+-------
 users      |     4
(1 row)

Where can I store the counter in the mean time? There is couple of options – for example – as a GUC (system setting), or using temporary, unlogged, table. If I was to use pl/Perl (or other pl/* languages) I could have some shared memory bit that I could use. But since we're in PL/pgSQL, I think I will (ab)use GUCs.

I will need 3 triggers. One – before statement, that will initialize the per-query counter, then row-level trigger to update the counter, and then after statement, to store the count change.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
$ create function initialize_row_counter() returns trigger as $$
begin
    perform set_config( 'trigger.row_counter', '0', true );
    return null;
end;
$$ language plpgsql;
 
$ CREATE TRIGGER initialize_row_counter_trg
    BEFORE INSERT OR DELETE
    ON users
    FOR EACH STATEMENT
    EXECUTE PROCEDURE initialize_row_counter();
 
$ create function modify_row_counter() returns trigger as $$
declare
    v_counter int4 := current_setting( 'trigger.row_counter' )::int4;
begin
    IF TG_OP = 'INSERT' THEN
        v_counter := v_counter + 1;
    ELSE
        v_counter := v_counter - 1;
    END IF;
    perform set_config( 'trigger.row_counter', v_counter::text, true );
    return null;
end;
$$ language plpgsql;
 
$ CREATE TRIGGER modify_row_counter_trg
    AFTER INSERT OR DELETE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE modify_row_counter();
 
$ create function apply_row_counter() returns trigger as $$
declare
    v_counter int4 := current_setting( 'trigger.row_counter' )::int4;
begin
    if v_counter <> 0 then
        UPDATE counts SET count = count + v_counter WHERE table_name = 'users';
    end if;
    perform set_config( 'trigger.row_counter', '0', true );
    return null;
end;
$$ language plpgsql;
 
$ CREATE TRIGGER apply_row_counter_trg
    AFTER INSERT OR DELETE
    ON users
    FOR EACH STATEMENT
    EXECUTE PROCEDURE apply_row_counter();

And now for some tests:

$ insert into users (username) select 'test' || i from generate_series(1,10) i;
INSERT 0 10
 
$ select * from counts;
 table_name | count
------------+-------
 users      |    14
(1 row)
 
$ delete from privileges;
DELETE 13
 
$ delete from users;
DELETE 14
 
$ select * from counts;
 table_name | count
------------+-------
 users      |     0
(1 row)

All good. One comment though. In the trigger functions I used:

set_config( 'trigger.row_counter', '0', true );

I am on PostgreSQL 9.3, and if you'd be testing this code on earlier Pg, you will get error saying:

ERROR:  unrecognized configuration parameter "trigger.row_counter"

This is because you'll need to declare “trigger" namespace for variables in your postgresql.conf, using:

custom_variable_classes = 'trigger'

Hope the example is clear. You might wander why I returned NULL in all of the cases. Well – for statement triggers, either BEFORE or AFTER, return value is ignored – since it doesn't work on actual rows anyway. And the row-level trigger was an AFTER trigger, so its return value is also ignored.

In case you'd want whole rows to be analyzed – it's also possible. You can use the GUCs to store text representation of array of rows, or just use temporary, unlogged, table that will keep copy of the modified rows.

But, in case you need it, I think it would be better to use pl/Perl (or other pl/* language), since it has a pretty usable %_SHARED hash/dict that can be used to store arbitrary structures. But that's not relevant for now.

After all of this I hope that it will be clear:

  • triggers are not scary
  • they can do pretty cool things
  • they give you certain guarantees which application-level code simply cannot due to separation
  • which trigger should be used for what and when
  • issuing update from trigger to modify just inserted row is a big NO-NO.

Of course, as with everything in life – with power comes responsibility. Your trigger can become problematic if it will lead to deadlocks ( example why it might happen, and what can be done to avoid the problem was described here).

One of the arguments against triggers is that they slow down operations.

For example – adding trigger to normalize fullname will cause inserts to be slower than without the trigger. This is of course true. But how problematic is it actually?

For simple test purpose, I recreate the table with no triggers:

$ drop table users cascade;
 
$ create table users (
    id serial primary key,
    username text not null unique,
    fullname text,
    created_on timestamptz not null default now(),
    last_modified timestamptz
);

And then I loaded, using fastest possible way:

copy users (username, fullname) FROM '/tmp/data.input';

100,000 rows to the table.

This is the worst case scenario – table is empty, there is no bloat, all happens as fast as possible, so trigger overhead will be as huge as possible.

The whole procedure (drop, create, copy) has been repeated 5 times a in row, with times:

  • 1029.678 ms
  • 1265.373 ms
  • 1220.776 ms
  • 1084.679 ms
  • 1077.521 ms

That is, we have values from 1029.678 ms to 1265.373 ms, with average of 1135.605 ms.

Then, I re-added the normalize_fullname trigger, as writen in the beginning, and repeated the test. Times now:

  • 1118.322 ms
  • 1098.351 ms
  • 1108.593 ms
  • 1083.157 ms
  • 1059.112 ms

Values from 1059.112 ms to 1118.322 ms, with average of 1093.507 ms! That's actually faster than without trigger. But I think that's just because of random load fluctuations on my test machine.

The point is – there is no real slowdown. Of course the trigger in here is trivial – just a simple function call and assignment. There would be slowdown (and more interesting) if we'd be doing some queries from triggers. Right? Sure. But if you'd but the same queries in app to run – you would have the overhead too.

All in all – trigger speed is related to how complex it is, and for simple task the overhead is negligible.

Finally, I would like to present my own point why I prefer triggers in database, as opposed to triggers in application code (i.e. paths in application code that run something additional to “normal" query, that can be made using trigger.

All those tools, orms, and so on, assume one very important thing – you will always communicate with database using them.

This is not a problem while you're writing your application – after all – it all goes through the model classes. But once you have the app deployed, and you need just a small job – some data load perhaps, or automatic tool ran from cron that does expiring – it gets complicated.

For example. Let's assume you have count of active users, kept up to date by application code inside some kind of ORM. All works fine, but you want to add auto-expiring users that match certain criteria.

If the counter was kept using trigger(s), you would just add to crontab:

psql -c "update users set expired = true where some + magic = condition"

and you're done.

But since the count-refreshing code is in app, you have to:

  • use the same code for this simple job – i.e. full-blown ORM, instead of simple query
  • use the same language – because the orm will most likely not be portable

This might not sound as a big issue, but just think about how much easier it is to write:

psql -c "delete from users where expired and last_login < now() - '6 months'::interval"

than write equivalent in your programming of choice. Sometimes shell + psql is better tool.

Does that mean that triggers should be used everywhere, and all kind of automation in app is bad? No. Sometimes you do know that exporting the logic one-layer-above will make the whole thing simpler. Plus – we don't have triggers that would run on commit, which would greatly help in some cases.

Anyway – I think that wraps it. If you have any problems with triggers – remember that there is very helpful irc channel where you can meet literally hundreds (641 now, though some are probably bots/inactive) people using PostgreSQL which can help you.

  1. 13 comments

  2. Nov 14, 2012

    Great primer. I’ve been using triggers more and more these days, especially the “update of column” kind.

    I do have a general question. Coming from the world of embedded software, I’ve always sort of seen triggers as SQL interrupts and generally the advice is to keep triggers “short” much like interrupts.

    But what does “short” really mean? What is this so-called limit? With interrupts you’d generally set a global or two and be done, but I’ve been using triggers to leverage sanity validation from applications, handle duplicate records and even built whole alarm systems in trigger space (using LISTEN/NOTIFY). I also use cascading triggers and I don’t really see the difference between using those and a big-ass stored procedure. They’ll both take as long to execute, and they’ll both rollback in case of failure. So where does that “keep your trigger short” mentality come from? Some actual real-life horror stories would be nice.

  3. Nov 14, 2012

    @denpanosekai:

    the only horror stories that I have are related to bugs in triggers, not their length. Things like deadlock-prone code, circular calls to triggers.

    So, can’t help with this.

  4. # Dragan Sahpaski
    Nov 14, 2012

    That’s a great Dr. Strangelove inspired title.
    Great post ! Thanks.

    BTW. The tab field switch between the name and email fields in the comment form doesn’t work. Name takes you to the search field on top of the page.

    Cheers

  5. Nov 14, 2012

    @Dragan:
    Thanks, fixed the tabindex, should be better now.

  6. # Michael
    Nov 15, 2012

    My philosophy is that all business logic and additional manipulation of the data should be included in the database. The database is not just for mere storage of data (tables), but that’s where the fastest processing because the data is there. In my opinion is totally wrong to drag data from the database to client for further processing and return a specific set of data back to the database. It is much easier to do and much faster using triggers, or by calling functions in the database.

  7. # Rafał
    Nov 15, 2012

    @Michael: I have opposite opinion. I’m Java developer and I prefer to have all business logic in Java, where I have object, design pattern and so many other great features ;) Database is for me for storage and I seldom (almost never) use triggers.
    @Depesz: very nice tutorial :)

  8. # Michael
    Nov 15, 2012

    @Rafal: For example, you have Sales Order and now you need to make Invoice from that order. You need to download Stock data, customer data … all to java client to determinate is there stock available or check customer limit …. And then in your objects populate invoice data and send data to insert into database. IstThat your way?
    For me, it’s to many data sending and retrieving throw wire and slowest way for acomplish goal.

    Faster way is to write PL/PGSQL function to do that, and you call that function from your java client.

    And one more, if you need to change something or add new stuff, you need to change function in one place (database). In your way you need to change in your client and than distribute new version of client to users.

  9. # Joe Van Dyk
    Nov 15, 2012

    I’d love to see an article about avoiding deadlocks (especially with triggers). I keep running into deadlocks when doing summary columns (say for every item in an order, we need to update an inventory count).

  10. Nov 15, 2012

    @Joe:
    well, deadlocks are relatively simple thing, though getting rid of them might be complicated. It is all on case by case basis, so I’m not sure how to write such article.

    If you have isolated transactions that happen to interlock, you might send it to me, so I’ll check what can be done about it.

  11. Nov 15, 2012

    @MICHAEL My opinion is the same. Moving entire business logic from C++/Java to PL/PGSQL has tremendously sped up operation and data integrity.

  12. # Piotr
    Nov 19, 2012

    My 0.2 cents.. – triggers are much more difficult to debug from developers perspective while support tickets are usually raised against development teams not db masters..

  13. # Troy
    Nov 21, 2012

    This is weird, I was feeling quite happy about triggers today (trigger happy?) and I thought of that classic Strangelove quote, and just on a whim, wondered if anybody had made a post called “How I learned to stop worrying and love the triggers”. So I googled it, and lo and behold there was an exact post by that title! And only from a week ago! Oh well, quite strange. Good article!

  14. Jul 7, 2013

    Triggers are useful, if used wisely.

Leave a comment