Similar question has been asked many times on mailing lists and on IRC. Sometimes it's not mail sending, but file/directory creation, or something else that generally requires some interaction with “world outside of database".
Can it be done? Sure. How, then?
There are couple of methods, each has its own set of drawbacks and benefits (as always).
The answer most people are looking for (at least initially) is – how to do it from within database, without any kind of additional programs.
To do so, we will need to use stored procedure (function in PostgreSQL). But not just any.
We need something special.
Let's see. In psql, when I do \dL (to list languages) I get list like this:
$ \dL List of languages Name │ Owner │ Trusted │ Description ─────────┼───────┼─────────┼────────────────────────────── plperl │ pgdba │ t │ PL/Perl procedural language plpgsql │ pgdba │ t │ PL/pgSQL procedural language (2 rows)
Most of it is self explanatory, but what is the “Trusted" thing?
Trusted languages are named so because you (at least theoretically) cannot interact (using them) with anything outside of database.
So, you cannot use files, sockets. This includes also loading modules in PL/Perl – since modules are in files – you can't load them (well, technically you can, but not within PL/Perl function, you have to do it from postgresql.conf).
So – any language marked as trusted will not work for stuff like sending mails. Simply not possible.
But – there are, so called, untrusted languages.
The very first, not listed above, but working in every Pg instance is “C". This is not a language for stored procedures – it's rather interface to functions written in C (or any other language compilable to .so files), stored somewhere in filesystem, that PostgreSQL backend can load.
While it's great, and provides limitless possibilities, we'd rather use something interpreted, so we could create the functions within SQL itself (like in psql).
Luckily, PostgreSQL has many accessible languages – all depending on compile options, and installed extensions. In base sources for PostgreSQL 9.2 (9.1 too afaik, and in previous versions too), we have:
You might notice that some of the languages have letter “U" at the end of name – these are untrusted languages. Meaning – using them you are not limited in what you can do. So, for example – using open() in PL/Perl will generally fail, but in PL/PerlU will happily work.
You might also notice, that there is no PL/Python – just PL/PythonU (and PL/Python2U, which is (currently) the same thing). Reason is very simple – some time ago developers found out that it is not possible to limit Python functionality in a way that's safe. So PL/Python had to disappear (in 7.4 version of Pg, loooong time ago).
So, let's load the language, and do something that will reach outside of database:
=$ psql -U pgdba -d test (pgdba@[LOCAL]:5920) 10:58:42 [test] $ CREATE extension plperlu; CREATE EXTENSION (pgdba@[LOCAL]:5920) 10:58:47 [test] $ \c test test You are now connected TO DATABASE "test" AS USER "test". (test@[LOCAL]:5920) 10:58:52 [test] $ CREATE FUNCTION make_dir(text) RETURNS void AS $$ mkdir('/tmp/' . shift ); $$ LANGUAGE plperlu; ERROR: permission denied FOR LANGUAGE plperlu
So, I was able to create the extension, but I can't create function using normal user. Reason is very simple – since plperlu is untrusted – only superusers can use it to create functions. And you can't even grant privileges on it:
(pgdba@[LOCAL]:5920) 11:00:31 [test] $ GRANT usage ON LANGUAGE plperlu TO test; ERROR: LANGUAGE "plperlu" IS NOT TRUSTED HINT: ONLY superusers can USE untrusted languages.
Does that mean that it's useless? Well, kind of. I stand on position that giving anyone superuser privileges in database equals to giving them shell access. But if you're not so strict – you can create the function using superuser, and then grant privileges to it to other users. Something like this:
(pgdba@[LOCAL]:5920) 11:03:39 [test] $ CREATE OR REPLACE FUNCTION make_dir(text) RETURNS void AS $$ mkdir('/tmp/' . shift ); $$ LANGUAGE plperlu; CREATE FUNCTION (pgdba@[LOCAL]:5920) 11:03:49 [test] $ GRANT EXECUTE ON FUNCTION make_dir(text) TO test; GRANT (pgdba@[LOCAL]:5920) 11:03:52 [test] $ \c test test You are now connected TO DATABASE "test" AS USER "test". (test@[LOCAL]:5920) 11:03:54 [test] $ SELECT make_dir('whatever'); make_dir ────────── [NULL] (1 ROW) (test@[LOCAL]:5920) 11:03:57 [test] $ \! ls -ld /tmp/whatever drwx------ 2 pgdba pgdba 4096 Jun 13 11:03 /tmp/whatever
As you can see I created the function using superuser account (pgdba), then granted privileges to execute it to test account, and ran it successfully from test. Directory has been created.
In the same way I could send mail, make HTTP request, or anything else.
And since it is a simple function in database – it can be used as a trigger function.
That's all good, but is there any way that doesn't require superuser privileges? Sure there is, but you will need to have something else instead.
The idea is very simple – you have one database connection that does “LISTEN", and then you have trigger that issues “NOTIFY".
So, let's test is.
First, I will need a test table, and the notifying trigger:
CREATE TABLE users (username text PRIMARY KEY, whatever int4);
and the trigger:
$ CREATE FUNCTION new_user_notify() RETURNS TRIGGER AS $$ BEGIN perform pg_notify('new_user', NEW.username); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE FUNCTION $ CREATE TRIGGER new_user_notify after INSERT ON users FOR each ROW EXECUTE PROCEDURE new_user_notify(); CREATE TRIGGER
And now I can test it:
|Session #1||Session #2|
|$ listen new_user;|
|$ insert into users (username, whatever) values (‘depesz', 1);||(nothing happens)|
|(still nothing happens)|
|$ commit;||(still nothing happens)|
$ SELECT 1; ?COLUMN? ────────── 1 (1 ROW) Asynchronous notification "new_user" WITH payload "depesz" received FROM server process WITH PID 11717.
We see couple of things:
- notification was delivered only AFTER COMMIT. This is important, because it means that we will not have situations when NOTIFY was sent, but transaction is not yet committed, so the other process cannot see the data.
- in psql at least – you have to do some kind of database interaction to get the notification
So – how can we use it to do something actually useful? Let's try to write a simple “daemon" that will simply print info about new users. Since I usually write in Perl, decided to write this (well, mostly copy/paste) in Python – for a change:
#!/usr/bin/env python # -*- coding: utf-8 -*- import select import psycopg2 import psycopg2.extensions dbc = psycopg2.connect(database='depesz', host='127.0.0.1', port=5920) dbc.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) cur = dbc.cursor() cur.execute('LISTEN new_user') while 1: if not select.select([dbc], , , 5) == (, , ): dbc.poll() while dbc.notifies: notify = dbc.notifies.pop() print "new_user %s, backed: %d" % (notify.payload, notify.pid)
When this is running – every inserted (and committed) new user, generates line like:
new_user neal_stephenson, backed: 11717 new_user daniel_suarez, backed: 13735
What the listening daemon will do – it's up to you. It can query back the database to get full row for new user, or it can send mails, or it can do anything you might want.
Is that perfect solution? Unfortunately no. While PostgreSQL will queue all NOTIFYs that weren't yet handled by all LISTENING processes (you can have many processes doing LISTEN on the same channel (new_user in my example)) – it will not queue them if nothing is listening.
So – if your listener will die – all notifications that happened in the time between its death and restart are gone. Of course this can be mitigated by running (for example) two daemons, and doing some kind of synchronization between them, but it's not perfect, and it's definitely not simple.
Interestingly – there is a whole set of tools that serve this particular case – get some info, and then keep it in queue until someone will take it. Yes – message queues.
There is queue manager for PostgreSQL – it's called pgq, and it's part of Skype's SkyTools.
At this moment you might say: whoa, so much work just to get some emails sent when new user is created? Is there really nothing simpler? Well, there is. You can write a simple queue yourself.
$ CREATE TABLE process_queue ( channel text, payload text, stored timestamptz, processed timestamptz ); $ CREATE FUNCTION new_user_handle() RETURNS TRIGGER AS $$ BEGIN INSERT INTO process_queue (channel, payload, stored) VALUES ('new_user', NEW.username, now()); RETURN NEW; END; $$ LANGUAGE plpgsql; $ CREATE TRIGGER new_user_handle after INSERT ON users FOR each ROW EXECUTE PROCEDURE new_user_handle();
Now, some inserts:
$ INSERT INTO users (username) VALUES ('depesz'); INSERT 0 1 $ INSERT INTO users (username) VALUES ('admin'); INSERT 0 1 $ INSERT INTO users (username) VALUES ('user'); INSERT 0 1 $ INSERT INTO users (username) VALUES ('test'); INSERT 0 1
Afterwards, process_queue contains some rows:
$ SELECT * FROM process_queue ; channel │ payload │ stored │ processed ──────────┼─────────┼───────────────────────────────┼─────────── new_user │ depesz │ 2012-06-13 11:53:22.068471+02 │ [NULL] new_user │ admin │ 2012-06-13 11:53:24.170629+02 │ [NULL] new_user │ USER │ 2012-06-13 11:53:31.141075+02 │ [NULL] new_user │ test │ 2012-06-13 11:53:32.884775+02 │ [NULL] (4 ROWS)
And, on top of it – just write a simple script that will run as daemon or as cronjob that will check the process_queue, and do whatever is necessary for all rows that are not yet processed.
What's more – you can also use LISTEN/NOTIFY with this approach, so that, with daemon approach, you will have fewer scans of the table. I.e. instead of:
while 1: check_for_new_rows_in_queue() sleep 5
(which scans the process_queue table every 5 seconds)
You will get logic as in python script above, waiting for notification from trigger, and only then checking for new rows. And if the daemon will die – all the not-yet-processed users will be waiting in process_queue, so on restart – just add one more check. Like:
check_for_new_rows_in_queue() while 1: wait_for_notification() check_for_new_rows_in_queue()
As a final note – personally, I prefer the last option – its simple, can be done with standard PostgreSQL – even in cases when admins do not want to install new languages or extensions, and it “just works".