How I Learned to Stop Worrying and Love the Triggers

Some people are afraid of triggers.

Reasons for this are not really understandable for me, but I guess it stems from the fact that these are usually application developers, and not database admins. Or they encountered some kind of problem with triggers, and now they tend to think that triggers are inherently evil.

But they are not.

As virtually anything, triggers have some benefits, and some drawbacks. With a bit of thinking you can use them to do really cool things. But first you have to understand what exactly trigger is, how it works, and when to use which kind.

A trigger is a function that gets automatically called in case of database event. What kind of event? Currently these are:

  • INSERT INTO
  • UPDATE
  • DELETE FROM
  • TRUNCATE (this one can be used for triggers only since PostgreSQL 8.4)

In upcoming 9.3 release there will be more types of triggers, but their usage is a bit more complicated, so I'll not cover them for now.

These triggers can be defined on every user table (you can't create triggers on system tables), and (since 9.1) also on views.

For now, I will focus on basic triggers on tables, as view triggers are a bit different. I will also skip conditional triggers, and triggers set on subset of fields.

Psqls “\h CREATE TRIGGER" shows:

Command:     CREATE TRIGGER
Description: define a NEW TRIGGER
Syntax:
CREATE [ CONSTRAINT ] TRIGGER name { BEFORE | AFTER | INSTEAD OF } { event [ OR ... ] }
    ON TABLE_NAME
    [ FROM referenced_table_name ]
    { NOT DEFERRABLE | [ DEFERRABLE ] { INITIALLY IMMEDIATE | INITIALLY DEFERRED } }
    [ FOR [ EACH ] { ROW | STATEMENT } ]
    [ WHEN ( condition ) ]
    EXECUTE PROCEDURE function_name ( arguments )
 
WHERE event can be one OF:
 
    INSERT
    UPDATE [ OF column_name [, ... ] ]
    DELETE
    TRUNCATE

That's quite a lot. But don't worry. There are, for now, just couple of types of triggers:

  • BEFORE or AFTER
  • FOR EACH ROW or STATEMENT

( side note: triggers on TRUNCATE cannot be made “FOR EACH ROW" – they can be only per STATEMENT ).

I'll start, conveniently from the end. What's the difference in “for each row" vs. “for each statement"?

That's pretty simple, and straight forward.

When you have multi-row operation, for example: delete from table; (without where), “for each row" trigger, will be called when deleting every single row. While “for each statement" will be ran only once.

That's not the only difference though.

When running “FOR EACH ROW" trigger, you have (in function) special variables which contain exact content of row that is being inserted/updated/deleted. So in the function you can do something with it. On the other hand – since FOR EACH STATEMENT triggers can work on millions of rows at once – they do not get information what rows were modified.

You might think that it is stupid – after all – what's the point of trigger if you don't know what got modified. But actually it has some serious uses, and the limitation that “statement trigger doesn't know what was modified" can be worked around. Truth must be told though, that in real life, I see much more row-level triggers than statement based ones.

The other important classification of trigger – AFTER/BEFORE.

Let's consider it in case of UPDATE. You're running an update. PostgreSQL has to check that all values you provided are sensible, match all constraints (not null, checks, unique), then it actually updates the row (well, technically it doesn't because PostgreSQL has mvcc, but for now let's pretend that it does).

BEFORE trigger is called just before the internal row representation is updated. AFTER trigger is called afterwards.

What does it tell us?

For starters – if you want to modify new values inserted/updated to the table – the correct place to do it is in BEFORE trigger.

I have seen code, where people were adding AFTER INSERT trigger, which issued UPDATE of newly inserted record. That just doesn't make sense. It bloats the table, and is slower. In BEFORE trigger, you can modify the row data before it will be stored in table.

To have a place for experiments, I'll make a simple users table:

CREATE TABLE users (
    id serial PRIMARY KEY,
    username text NOT NULL UNIQUE,
    fullname text,
    created_on timestamptz NOT NULL DEFAULT now(),
    last_modified timestamptz
);

And now we can experiment on it. First – let's assume that you'd want fullnames to be normalized to lowercase, with uppercased first letter of every word (just in case: this is bad idea, and don't do it in real life code).

For this, the simple solution is to create a trigger that will modify fullname of all records – on INSERT, and on UPDATE.

Before I can write it, we need couple of information:

  • trigger calls a function (i.e. logic is not in trigger, but it's in function (which can be shared between multiple triggers, on multiple tables)
  • such function should be defined not to have any arguments and return pseudotype TRIGGER
  • in case of FOR EACH ROW triggers, whole rows are available as either NEW or OLD record variables (in case of insert – NEW, in case of DELETE – OLD, in case of UPDATE – both, OLD one is state of row before update, NEW is state of row after update)
  • there are some more special variables, which are listed in fine manual

If you're reading carefully you might be confused that I wrote that such function doesn't have arguments, but in psql \h output above it shows:

EXECUTE PROCEDURE function_name ( arguments )

which suggests that arguments are possible.

That's right – you can have arguments to trigger, but they are not passed as arguments to function. Instead they are passed using special variable TG_ARGV, and its compantion TG_NARGS.

Knowing all of this we can write our fullname-normalizing function:

  1. CREATE FUNCTION normalize_fullname() RETURNS TRIGGER AS $$
  2. BEGIN
  3.     NEW.fullname := initcap( NEW.fullname );
  4.     RETURN NEW;
  5. END;
  6. $$ LANGUAGE plpgsql;

In case you're not familiar with functions in PostgreSQL:

  • line 1 – begins “CREATE FUNCTION" statement, including information that the function will be named normalize_fullname, will not have any arguments “()", and it returns value of pseudotype trigger.
  • line 2 – every function in plpgsql (check line 6) has to have BEGIN -> END; block, which is its body
  • line 3 – NEW is record which contains data for new row. NEW.fullname is value of fullname column in this record. initcap function takes string, and returns modified version, so it has to be assigned to NEW.fullname again to make the change actually happen
  • line 4 – function has to return something, and the returned value is used to insert/update the data.
  • line 5 – end of function body
  • line 6 – declaration that the function is written in “plpgsql" language (one of many possible pl/* languages)

With the function in place, I can actually create a trigger:

  1. CREATE TRIGGER normalize_fullname_trg
  2.     BEFORE INSERT OR UPDATE
  3.     ON users
  4.     FOR EACH ROW
  5.     EXECUTE PROCEDURE normalize_fullname();

and explanation:

  • line 1 – beginning of CREATE TRIGGER statement. Each trigger has its own name, which can, but doesn't have to be, like executed function
  • line 2 – marks the trigger as BEFORE, and sets it to be ran on INSERT and UPDATE events (we don't care about delete when normalizing data, and TRUNCATE is different anyway)
  • line 3 – sets the table the trigger is created on
  • line 4 – marks the trigger as per ROW, and not per STATEMENT
  • line 5 – names the function that has to be called whenever something happens

Does that work? Let's see:

$ INSERT INTO users ( username, fullname ) VALUES ( 'depesz', 'hubert lubaczewski' );
INSERT 0 1
 
$ SELECT * FROM users;
 id | username |      fullname      |          created_on           | last_modified
----+----------+--------------------+-------------------------------+---------------
  1 | depesz   | Hubert Lubaczewski | 2012-11-13 14:58:47.500289+01 | [NULL]
(1 ROW)

and what about update?

$ UPDATE users SET fullname = 'ABSOLUTELY IRRELEVANT NAME';
UPDATE 1
 
$ SELECT * FROM users;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [NULL]
(1 ROW)

Looks like working.

This looks easy, I hope. Now for a bit more advanced topics.

I mentioned that function returns NEW because it's the row that has to be set in table, but that's not always the case. You can also return NULL value, which will do something really sneaky – it will not do the insert/update/delete, but it will also not raise any exception/error. It will just silently ignore your command.

Let's see simple example:

$ CREATE FUNCTION we_dont_like_them() RETURNS TRIGGER AS $$
BEGIN
    IF NEW.fullname ilike '%mysql%' THEN
        RETURN NULL;
    ELSIF NEW.username ilike '%mysql%' THEN
        RETURN NULL;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
$ CREATE TRIGGER we_dont_like_them_trg
    BEFORE INSERT OR UPDATE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE we_dont_like_them();
 
$ INSERT INTO users (username, fullname) VALUES
    ( 'test1', 'mysql' ),
    ( 'test2', 'MYSQL' ),
    ( 'mysql1', 'test3' ),
    ( 'MySQL2', 'test4' ),
    ( 'mssql', 'oops' );
 
$ SELECT * FROM users;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [NULL]
  6 | mssql    | Oops                       | 2012-11-13 15:19:03.42297+01  | [NULL]
(2 ROWS)

As you can see all users which mention MySQL got silently skipped.

You can also use triggers to do some kind of more complex validation. And if given user fails – use RAISE EXCEPTION to break the transaction. But actually – these kind of things should be rather done in CHECK constraints, and not triggers.

All of what I showed above are BEFORE-PER-ROW trigger. So what good are other triggers for?

AFTER triggers are meant to be place where you put logic that influences other tables when the modifiation of your original table succeds.

For example, let's assume we have also privileges table:

CREATE TABLE privileges (
    user_id int4 PRIMARY KEY REFERENCES users (id),
    privileges text[] NOT NULL
);

And, for whatever reason, we want a row to exist for every user in privileges table, with basic ‘none' privilege.

To do it, I can:

$ CREATE FUNCTION make_default_privileges() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO privileges (user_id, privileges) VALUES ( NEW.id, ARRAY[ 'none' ]::text[] );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
$ CREATE TRIGGER make_default_privileges_trg
    AFTER INSERT
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE make_default_privileges();
 
$ INSERT INTO users (username) VALUES ('guest') returning *;
 id | username | fullname |          created_on           | last_modified
----+----------+----------+-------------------------------+---------------
  7 | guest    | [NULL]   | 2012-11-13 15:26:45.790506+01 | [NULL]
(1 ROW)
 
$ SELECT * FROM privileges;
 user_id | privileges
---------+------------
       7 | {NONE}
(1 ROW)

Of course you might say: “OK, but I can do it in BEFORE trigger too, right?".

Well, right. You can. At least kind-of. NEW.id is already assigned. But you shouldn't.

There are two reasons. First one – foreign key. If you'll insert to privileges from BEFORE INSERT trigger, foreign key on privileges will complain because from its point of view, the row in users does not exist yet.

But even without foreign key – you can have many triggers on a row. Some trigger might reject the row, silently or with exception. In such case – we shouldn't insert to another table – not because the data will stay there – as it will not, but because it's unnecessary work.

Hence – if you have to issue insert/update/delete from trigger – it should almost always be AFTER trigger.

As you perhaps noticed, I did ‘return NEW' in the function. If you are inquisitive, you might wonder: if I'll return NULL, will it also ignore the row? No. In AFTER trigger, returned value doesn't matter at all. I usually do return NEW, but I could do return NULL as well – the row is already stored in table, so I can't “ignore" it afterwards.

This brings me to one of the most commonly asked questions about triggers (at least commonly to me): if the insert will fail, will the effects of trigger stay in db.

Long story short – no. Even if you don't use transactions explicitly, each query sent to PostgreSQL is in implicit, single-query, transaction. And command and its triggers are always within the same transaction. So if trigger will raise exception – it will also remove the changes that command applied. If one of the triggers will raise exception – effects of all triggers, ran by this command, will be rolled back.

This is great, because it gives you guarantee that whatever you do in trigger, will always be applied only with the data change from application.

When you have application-level triggers (in ORM, or whatever) – you can, by mistake, run two related modifications in separate transaction, and accidentaly have one rolled back, but the other not – thus ending up with “mixed" state of database.

When the logic is in trigger – all changes, are treated as atomic – and you either have all of them, or none.

Back to explaining.

One of the most common tasks for triggers (at least in my environment) is forbiding changes of particular data, or forcing change of given columns.

For example – let's look at the users table:

                                     TABLE "public.users"
    COLUMN     |           TYPE           |                     Modifiers
---------------+--------------------------+----------------------------------------------------
 id            | INTEGER                  | NOT NULL DEFAULT NEXTVAL('users_id_seq'::regclass)
 username      | text                     | NOT NULL
 fullname      | text                     |
 created_on    | TIMESTAMP WITH TIME zone | NOT NULL DEFAULT now()
 last_modified | TIMESTAMP WITH TIME zone |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_username_key" UNIQUE CONSTRAINT, btree (username)
Referenced BY:
    TABLE "privileges" CONSTRAINT "privileges_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id)
Triggers:
    make_default_privileges_trg AFTER INSERT ON users FOR EACH ROW EXECUTE PROCEDURE make_default_privileges()
    normalize_fullname_trg BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE normalize_fullname()
    we_dont_like_them_trg BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE we_dont_like_them()

created_on column – ideally it shouldn't be possible to specify anything other than “now", but default value doesn't protect us. I can easily create user with fake created_on:

$ INSERT INTO users (username, created_on) VALUES ( 'bad santa', '0001-01-01 00:00:00' );
 
$ SELECT * FROM users WHERE username = 'bad santa';
 id | username  | fullname |        created_on         | last_modified
----+-----------+----------+---------------------------+---------------
  8 | bad santa | [NULL]   | 0001-01-01 00:00:00+01:24 | [NULL]
(1 ROW)

What's more – after the row is in there, I can update the column – which also is not good idea.

On the other hand – last_modified should be set to now() on every update, even if I forget to set it in update statement. And it shouldn't be set to anything on insert.

This is of course also task for triggers. Based on what you learned earlier, you might understand that it should be BEFORE, but should it be one trigger (on both INSERT and UPDATE), or two separate triggers (because the logic is different in those cases).

Doing it in two triggers is trivial, but doing it in one function will let me show you some more cool stuff – namely TG_OP variable (described in the manual):

  1. $ CREATE FUNCTION sanitize_dates() RETURNS TRIGGER AS $$
  2. BEGIN
  3.     IF TG_OP = 'INSERT' THEN
  4.         NEW.created_on := now();
  5.         NEW.last_modified := NULL;
  6.     ELSIF TG_OP = 'UPDATE' THEN
  7.         NEW.created_on := OLD.created_on;
  8.         NEW.last_modified := now();
  9.     END IF;
  10.     RETURN NEW;
  11. END;
  12. $$ LANGUAGE plpgsql;
  13. $ CREATE TRIGGER sanitize_dates_trg
  14.     BEFORE INSERT OR UPDATE
  15.     ON users
  16.     FOR EACH ROW
  17.     EXECUTE PROCEDURE sanitize_dates();

Interesting bit are the IFs in the function.

  • First, in lines 3-5, function checks if it was called for INSERT, and if so – sets created_on for now() and last_modified to NULL. Thus – if user supplied any information there, it will be ignored.
  • Second, in lines 6-8, function checks if the command was UPDATE, and if yes – copies created_on from OLD version of the record to NEW data – hence, if the user tried to set new value of created_on – it will be overwritten. And then it sets last_modified to now().

And how does it work?

$ SELECT now();
              now
-------------------------------
 2012-11-13 15:50:13.750914+01
(1 ROW)
 
$ INSERT INTO users (username, created_on, last_modified) VALUES ( 'paranoid parrot', '0001-01-01 00:00:00', '1410-07-15 07:00:00' );
 
$ SELECT * FROM users WHERE username = 'paranoid parrot';
 id |    username     | fullname |          created_on           | last_modified
----+-----------------+----------+-------------------------------+---------------
  9 | paranoid parrot | [NULL]   | 2012-11-13 15:50:13.751598+01 | [NULL]
(1 ROW)
 
$ UPDATE users SET created_on = '0001-01-01 00:00:00', last_modified = '1410-07-15 07:00:00' WHERE username = 'paranoid parrot';
 
$ SELECT * FROM users WHERE username = 'paranoid parrot';
 id |    username     | fullname |          created_on           |         last_modified
----+-----------------+----------+-------------------------------+-------------------------------
  9 | paranoid parrot | [NULL]   | 2012-11-13 15:50:13.751598+01 | 2012-11-13 15:50:13.765139+01
(1 ROW)

Sweet. Of course – triggers can be disabled, but it requires specific privileges, so that's not really relevant.

So, I covered BEFORE and AFTER, for inserts, and updates. What about DELETE?

Let's make archive table:

CREATE TABLE users_archive (
    id int4 PRIMARY KEY,
    username text NOT NULL UNIQUE,
    fullname text,
    created_on timestamptz NOT NULL DEFAULT now(),
    last_modified timestamptz,
    deleted_on timestamptz
);

This table is almost the same like users, but it is not linked to sequence for id column, and it has one more column – deleted_on.

With this I can:

CREATE FUNCTION archive_users() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO users_archive (id, username, fullname, created_on, last_modified, deleted_on)
        VALUES (OLD.id, OLD.username, OLD.fullname, OLD.created_on, OLD.last_modified, now());
    RETURN OLD;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER archive_users_trg
    AFTER DELETE
    ON users
    FOR EACH ROW
    EXECUTE PROCEDURE archive_users();
 
$ DELETE FROM users WHERE id = 1 returning *;
 id | username |          fullname          |          created_on           | last_modified
----+----------+----------------------------+-------------------------------+---------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [NULL]
(1 ROW)
 
$ SELECT * FROM users_archive;
 id | username |          fullname          |          created_on           | last_modified |          deleted_on
----+----------+----------------------------+-------------------------------+---------------+-------------------------------
  1 | depesz   | Absolutely Irrelevant Name | 2012-11-13 14:58:47.500289+01 | [NULL]        | 2012-11-13 15:58:15.001053+01
(1 ROW)

With this, I have example for all kinds of FOR EACH ROW triggers. But what about STATEMENT? The thing is that when using them you don't actually know what rows were inserted/updated/deleted. So, this might be useful, for example, to regenerate some kind of materialized view that you're OK to redo after every change to your base table.

But, there is one “interesting" trick you can do.

Let's assume you want to store, in side table, total count of all rows in a table. So that your “select count(*) from table" can be modified to: “select count from counts where table_name = ‘table'".

If we'd use standard “FOR EACH ROW" triggers, this could lead to relatively important problems.

Let's consider case, when, on INSERT you increment the counter in side table, on DELETE you decrement it. Trivial, right? Two triggers (or one with IF), “AFTER INSERT" and “AFTER DELETE".

But what would happen if I did: INSERT INTO table values ()….. – for 500,000 rows? Or delete of a million of rows?

Well, the counts table would get updated 500,000 times, or a million times. Does it sound like a good idea? Hope you said: “NO!".

Thanks to STATEMENT triggers we can do something smarter. For example: have “a counter" somewhere in memory, and make ROW triggers increment it (without any write overhead, and make AFTER STATEMENT trigger do actual count update.

How would that work? First we need the counts table:

$ CREATE TABLE counts (
    TABLE_NAME text PRIMARY KEY,
    COUNT int4 NOT NULL
);

Then, I will store initial row:

$ INSERT INTO counts (TABLE_NAME, COUNT) SELECT 'users', COUNT(*) FROM users;
 
$ SELECT * FROM counts;
 TABLE_NAME | COUNT
------------+-------
 users      |     4
(1 ROW)

Where can I store the counter in the mean time? There is couple of options – for example – as a GUC (system setting), or using temporary, unlogged, table. If I was to use pl/Perl (or other pl/* languages) I could have some shared memory bit that I could use. But since we're in PL/pgSQL, I think I will (ab)use GUCs.

I will need 3 triggers. One – before statement, that will initialize the per-query counter, then row-level trigger to update the counter, and then after statement, to store the count change.

  1. $ CREATE FUNCTION initialize_row_counter() RETURNS TRIGGER AS $$
  2. BEGIN
  3.     perform set_config( 'trigger.row_counter', '0', TRUE );
  4.     RETURN NULL;
  5. END;
  6. $$ LANGUAGE plpgsql;
  7.  
  8. $ CREATE TRIGGER initialize_row_counter_trg
  9.     BEFORE INSERT OR DELETE
  10.     ON users
  11.     FOR EACH STATEMENT
  12.     EXECUTE PROCEDURE initialize_row_counter();
  13.  
  14. $ CREATE FUNCTION modify_row_counter() RETURNS TRIGGER AS $$
  15. DECLARE
  16.     v_counter int4 := current_setting( 'trigger.row_counter' )::int4;
  17. BEGIN
  18.     IF TG_OP = 'INSERT' THEN
  19.         v_counter := v_counter + 1;
  20.     ELSE
  21.         v_counter := v_counter - 1;
  22.     END IF;
  23.     perform set_config( 'trigger.row_counter', v_counter::text, TRUE );
  24.     RETURN NULL;
  25. END;
  26. $$ LANGUAGE plpgsql;
  27.  
  28. $ CREATE TRIGGER modify_row_counter_trg
  29.     AFTER INSERT OR DELETE
  30.     ON users
  31.     FOR EACH ROW
  32.     EXECUTE PROCEDURE modify_row_counter();
  33.  
  34. $ CREATE FUNCTION apply_row_counter() RETURNS TRIGGER AS $$
  35. DECLARE
  36.     v_counter int4 := current_setting( 'trigger.row_counter' )::int4;
  37. BEGIN
  38.     IF v_counter <> 0 THEN
  39.         UPDATE counts SET COUNT = COUNT + v_counter WHERE TABLE_NAME = 'users';
  40.     END IF;
  41.     perform set_config( 'trigger.row_counter', '0', TRUE );
  42.     RETURN NULL;
  43. END;
  44. $$ LANGUAGE plpgsql;
  45.  
  46. $ CREATE TRIGGER apply_row_counter_trg
  47.     AFTER INSERT OR DELETE
  48.     ON users
  49.     FOR EACH STATEMENT
  50.     EXECUTE PROCEDURE apply_row_counter();

And now for some tests:

$ INSERT INTO users (username) SELECT 'test' || i FROM generate_series(1,10) i;
INSERT 0 10
 
$ SELECT * FROM counts;
 TABLE_NAME | COUNT
------------+-------
 users      |    14
(1 ROW)
 
$ DELETE FROM privileges;
DELETE 13
 
$ DELETE FROM users;
DELETE 14
 
$ SELECT * FROM counts;
 TABLE_NAME | COUNT
------------+-------
 users      |     0
(1 ROW)

All good. One comment though. In the trigger functions I used:

set_config( 'trigger.row_counter', '0', TRUE );

I am on PostgreSQL 9.3, and if you'd be testing this code on earlier Pg, you will get error saying:

ERROR:  unrecognized configuration parameter "trigger.row_counter"

This is because you'll need to declare “trigger" namespace for variables in your postgresql.conf, using:

custom_variable_classes = 'trigger'

Hope the example is clear. You might wander why I returned NULL in all of the cases. Well – for statement triggers, either BEFORE or AFTER, return value is ignored – since it doesn't work on actual rows anyway. And the row-level trigger was an AFTER trigger, so its return value is also ignored.

In case you'd want whole rows to be analyzed – it's also possible. You can use the GUCs to store text representation of array of rows, or just use temporary, unlogged, table that will keep copy of the modified rows.

But, in case you need it, I think it would be better to use pl/Perl (or other pl/* language), since it has a pretty usable %_SHARED hash/dict that can be used to store arbitrary structures. But that's not relevant for now.

After all of this I hope that it will be clear:

  • triggers are not scary
  • they can do pretty cool things
  • they give you certain guarantees which application-level code simply cannot due to separation
  • which trigger should be used for what and when
  • issuing update from trigger to modify just inserted row is a big NO-NO.

Of course, as with everything in life – with power comes responsibility. Your trigger can become problematic if it will lead to deadlocks ( example why it might happen, and what can be done to avoid the problem was described here).

One of the arguments against triggers is that they slow down operations.

For example – adding trigger to normalize fullname will cause inserts to be slower than without the trigger. This is of course true. But how problematic is it actually?

For simple test purpose, I recreate the table with no triggers:

$ DROP TABLE users cascade;
 
$ CREATE TABLE users (
    id serial PRIMARY KEY,
    username text NOT NULL UNIQUE,
    fullname text,
    created_on timestamptz NOT NULL DEFAULT now(),
    last_modified timestamptz
);

And then I loaded, using fastest possible way:

copy users (username, fullname) FROM '/tmp/data.input';

100,000 rows to the table.

This is the worst case scenario – table is empty, there is no bloat, all happens as fast as possible, so trigger overhead will be as huge as possible.

The whole procedure (drop, create, copy) has been repeated 5 times a in row, with times:

  • 1029.678 ms
  • 1265.373 ms
  • 1220.776 ms
  • 1084.679 ms
  • 1077.521 ms

That is, we have values from 1029.678 ms to 1265.373 ms, with average of 1135.605 ms.

Then, I re-added the normalize_fullname trigger, as writen in the beginning, and repeated the test. Times now:

  • 1118.322 ms
  • 1098.351 ms
  • 1108.593 ms
  • 1083.157 ms
  • 1059.112 ms

Values from 1059.112 ms to 1118.322 ms, with average of 1093.507 ms! That's actually faster than without trigger. But I think that's just because of random load fluctuations on my test machine.

The point is – there is no real slowdown. Of course the trigger in here is trivial – just a simple function call and assignment. There would be slowdown (and more interesting) if we'd be doing some queries from triggers. Right? Sure. But if you'd but the same queries in app to run – you would have the overhead too.

All in all – trigger speed is related to how complex it is, and for simple task the overhead is negligible.

Finally, I would like to present my own point why I prefer triggers in database, as opposed to triggers in application code (i.e. paths in application code that run something additional to “normal" query, that can be made using trigger.

All those tools, orms, and so on, assume one very important thing – you will always communicate with database using them.

This is not a problem while you're writing your application – after all – it all goes through the model classes. But once you have the app deployed, and you need just a small job – some data load perhaps, or automatic tool ran from cron that does expiring – it gets complicated.

For example. Let's assume you have count of active users, kept up to date by application code inside some kind of ORM. All works fine, but you want to add auto-expiring users that match certain criteria.

If the counter was kept using trigger(s), you would just add to crontab:

psql -c "update users set expired = true where some + magic = condition"

and you're done.

But since the count-refreshing code is in app, you have to:

  • use the same code for this simple job – i.e. full-blown ORM, instead of simple query
  • use the same language – because the orm will most likely not be portable

This might not sound as a big issue, but just think about how much easier it is to write:

psql -c "delete from users where expired and last_login < now() - '6 months'::interval"

than write equivalent in your programming of choice. Sometimes shell + psql is better tool.

Does that mean that triggers should be used everywhere, and all kind of automation in app is bad? No. Sometimes you do know that exporting the logic one-layer-above will make the whole thing simpler. Plus – we don't have triggers that would run on commit, which would greatly help in some cases.

Anyway – I think that wraps it. If you have any problems with triggers – remember that there is very helpful irc channel where you can meet literally hundreds (641 now, though some are probably bots/inactive) people using PostgreSQL which can help you.

38 thoughts on “How I Learned to Stop Worrying and Love the Triggers”

  1. Great primer. I’ve been using triggers more and more these days, especially the “update of column” kind.

    I do have a general question. Coming from the world of embedded software, I’ve always sort of seen triggers as SQL interrupts and generally the advice is to keep triggers “short” much like interrupts.

    But what does “short” really mean? What is this so-called limit? With interrupts you’d generally set a global or two and be done, but I’ve been using triggers to leverage sanity validation from applications, handle duplicate records and even built whole alarm systems in trigger space (using LISTEN/NOTIFY). I also use cascading triggers and I don’t really see the difference between using those and a big-ass stored procedure. They’ll both take as long to execute, and they’ll both rollback in case of failure. So where does that “keep your trigger short” mentality come from? Some actual real-life horror stories would be nice.

  2. @denpanosekai:

    the only horror stories that I have are related to bugs in triggers, not their length. Things like deadlock-prone code, circular calls to triggers.

    So, can’t help with this.

  3. That’s a great Dr. Strangelove inspired title.
    Great post ! Thanks.

    BTW. The tab field switch between the name and email fields in the comment form doesn’t work. Name takes you to the search field on top of the page.

    Cheers

  4. My philosophy is that all business logic and additional manipulation of the data should be included in the database. The database is not just for mere storage of data (tables), but that’s where the fastest processing because the data is there. In my opinion is totally wrong to drag data from the database to client for further processing and return a specific set of data back to the database. It is much easier to do and much faster using triggers, or by calling functions in the database.

  5. @Michael: I have opposite opinion. I’m Java developer and I prefer to have all business logic in Java, where I have object, design pattern and so many other great features 😉 Database is for me for storage and I seldom (almost never) use triggers.
    @Depesz: very nice tutorial 🙂

  6. @Rafal: For example, you have Sales Order and now you need to make Invoice from that order. You need to download Stock data, customer data … all to java client to determinate is there stock available or check customer limit …. And then in your objects populate invoice data and send data to insert into database. IstThat your way?
    For me, it’s to many data sending and retrieving throw wire and slowest way for acomplish goal.

    Faster way is to write PL/PGSQL function to do that, and you call that function from your java client.

    And one more, if you need to change something or add new stuff, you need to change function in one place (database). In your way you need to change in your client and than distribute new version of client to users.

  7. I’d love to see an article about avoiding deadlocks (especially with triggers). I keep running into deadlocks when doing summary columns (say for every item in an order, we need to update an inventory count).

  8. @Joe:
    well, deadlocks are relatively simple thing, though getting rid of them might be complicated. It is all on case by case basis, so I’m not sure how to write such article.

    If you have isolated transactions that happen to interlock, you might send it to me, so I’ll check what can be done about it.

  9. @MICHAEL My opinion is the same. Moving entire business logic from C++/Java to PL/PGSQL has tremendously sped up operation and data integrity.

  10. My 0.2 cents.. – triggers are much more difficult to debug from developers perspective while support tickets are usually raised against development teams not db masters..

  11. This is weird, I was feeling quite happy about triggers today (trigger happy?) and I thought of that classic Strangelove quote, and just on a whim, wondered if anybody had made a post called “How I learned to stop worrying and love the triggers”. So I googled it, and lo and behold there was an exact post by that title! And only from a week ago! Oh well, quite strange. Good article!

  12. I agree and I have made extreme usageof trigger using PLPGSQL It’s pretty cool. PLPGSQL is very powerful making effective function and trigger.It help me embed all data constraints, business logic,and rule to database while leave Java do its own job, UI for client. More I can write Test Case using SQL against all business logic complete by trigger before starting Java application coding.

    @Rafal. Your team will end up spend more time in Java . Instead of writing application, UI, user interactive, look and feel, your team have to write business logic that could be done on server side.

  13. This is really a good article.
    Can you post more on statement level trigger.

  14. @Akshay:
    Not sure. The thing is that there is not all that much that can be achieved with them.

    In my experience so far, I use row level triggers on daily basis, and statement level – well, maybe twice in the last 10 years.

  15. Nice article. Thanks @Depesz.

    I am looking for something specific. Lets say that due to some corruption, automatic recovery is triggered by postgres. This results in “redo start at 0/9A3F58” as I can in the database logs. As part of the recovery, I suppose it would try to insert the records for a table. Does it cause database insert triggers for that table to be executed as well. We are using postgres 8.4.

    Snippet from postgres logs:

    2015-06-17 10:43:34 PDT LOG: unexpected EOF on client connection
    2015-06-17 10:43:34 PDT LOG: unexpected EOF on client connection
    2015-06-17 10:43:34 PDT LOG: unexpected EOF on client connection
    2015-06-17 10:43:34 PDT LOG: unexpected EOF on client connection
    2015-06-19 08:55:30 CDT LOG: database system was interrupted; last known up at 2015-06-17 20:05:02 CDT
    2015-06-19 08:55:30 CDT LOG: database system was not properly shut down; automatic recovery in progress
    2015-06-19 08:55:30 CDT LOG: redo starts at 0/9A3F58
    2015-06-19 08:55:30 CDT LOG: incomplete startup packet
    2015-06-19 08:55:30 CDT FATAL: the database system is starting up
    2015-06-19 08:55:30 CDT LOG: record with zero length at 0/E90334
    2015-06-19 08:55:30 CDT LOG: redo done at 0/E90308
    2015-06-19 08:55:30 CDT LOG: last completed transaction was at log time 2015-06-17 12:43:32.471831-05
    2015-06-19 08:55:31 CDT LOG: database system is ready to accept connections
    2015-06-19 08:55:31 CDT LOG: autovacuum launcher started
    2015-06-19 08:59:29 CDT LOG: unexpected EOF on client connection
    2015-06-19 08:59:29 CDT LOG: unexpected EOF on client connection
    2015-06-19 08:59:29 CDT LOG: unexpected EOF on client connection

  16. @Kushal:
    one big thing you have to remember is that triggers are not called in recovery. What is being done, is WAL being applied, and WAL contains data changes, and not instructions like “call trigger”.
    So – trigger will not be called, but the data that was inserted by trigger, is still in WAL, and it will get applied.

  17. @Depesz
    Thanks for quick reply. On a different note, is there a way for me to induce such corruption resulting in auto-recovery like the one shown in logs above. For some reason, I believe it might have cause a problem in my workflow.

  18. Hi,
    I just read your post, I think I have a better understanding now about “triggers”.
    I was wondering if this could work with 3 tables too.
    For example:
    Table A(
    pro_num integer primary key,
    stock integer);

    Table B(
    cli_num integer primary key,
    pro_num integer,

  19. Hi,
    I’m Daniel and I just read your post and I think that I have a better understanding about “triggers”.
    I was wondering if this could work with 3 tables too. For example, I need to update the “stock” in Table A updating the Table C (A and C are connected by Table B).

    TABLE A(
     pro_num INTEGER PRIMARY KEY,
     stock INTEGER);
     
    TABLE B(
     cli_num PRIMARY KEY,
     pro_num INTEGER,
     CONSTRAINT fk_B (pro_num) REFERENCES A(pro_num) );
     
    TABLE C(
     del_num INTEGER PRIMARY KEY,
     pro_num INTEGER,
     CONSTRAINT fk_C (pro_num) REFERENCES B(pro_num));

    Now, making a INSERT in Table C would update the “stock” in Table A.
    I have this:

    CREATE OR REPLACE FUNCTION insert_C() RETURNS TRIGGER
    AS $$
    DECLARE
    BEGIN
    	--Update
    	UPDATE A
    	SET "stock" = "stock"-1
    	WHERE A."Pro_num"=NEW."Pro_num";
    END;
    $$ LANGUAGE plpgsql;
     
    CREATE TRIGGER tr_insert_C AFTER INSERT ON A
    FOR each ROW
    EXECUTE PROCEDURE insert_C();

    But I’m not able to make this stock Update work.
    Any ideas?
    Thank you in advance.

  20. Sorry, but I have no idea what you’re trying to update based on what values – the names of tables and columns are so vague that it’s impossible to make any guess.

    One thing though – you make trigger on insert to a that updates table a. that doesn’t make sense, in general.

  21. Hi, great article, landed here following a link in the book PostgreSql Up and Running…

    One question… you comment that…”let’s assume that you’d want fullnames to be normalized to lowercase, with uppercased first letter of every word (just in case: this is bad idea, and don’t do it in real life code).”

    Is the reason, as you comment, because “these kind of things should be rather done in CHECK constraints, and not triggers” ? or there is another reason?

    Could you elaborate on that?

    Thank you !

  22. @Sven:

    CHECK constraint can be used to make sure data inserted/updated matches specific format. As in: if it doesn’t match – it forbids insert/update.

    But trigger can be made to enforce format. As in: if it doesn’t match the format: modify the data so that it will.

  23. Hi,
    is it possible to get column default value in “before insert” trigger?

    Let’s say I gave table “t1”:
    id serial
    active boolean default false

    And trigger:
    before insert or update of active on t1 for each row execute…
    IF OLD.active is true then NEW.active := false

    Of course this is stupid example but should give some clue. I though maybe OLD will get default which is false but I only get record “old” is not assigned yet.

  24. @Mariusz:

    OLD.* would work in UPDATE trigger, but would contaqin, not “default” value, but rather what was in the column before update.

    Generally speaking – no, there is no way to get value of column that was given in command. If it’s not given, then default value is applied, and it happens before any triggers are called.

  25. Ok. So the only way to get this value, is to query information_schema directly inside trigger?

    select column_default from information_schema.columns where table_name = TG_TABLE_NAME …;

  26. What if you update a table packed with plenty of business logic, launching cascading triggers in other tables? Do you still use trigger or plpgsql function?

  27. @Wojtek:

    sorry, but I don’t quite understand.

    trigger calls a function. So you can’t have a trigger and *not* have a function.

  28. @DEPESZ
    Case: update on invoice item causes changes in 5 other tables. This requires a lot of code. I wonder if this would be readable in triggers, or I’d better put it into one transaction in one function.

  29. @Wojtek:

    you do realize, it will always be inside one transaction?

    How you will put the code, doesn’t matter to me. I would choose triggers, usually, because they give better assurances that the data will be updated.

  30. @DEPESZ
    There is one more problem with triggers for me. I update the row by plpgsql function. When the update fails inside the trigger, the function should return an error message to the client. How to pass custom error code or message from trigger to updating function?

  31. @Wojtek:

    trigger can raise exception. You can also use RAISE LOG, or RAISE NOTICE to send any kind of information to client.

  32. OK.My question was a bit stupid. I wanted to pass the exception hint from trigger to the function that calls update. I discovered that I’ve to use RAISE without any arguments.
    Thank you!

Comments are closed.