Waiting for 8.5 – PL/pgSQL variable resolution

On 13th of November (I know, backlog again), Tom Lane committed patch which make PostgreSQL more strict about what happens in stored procedures in PL/pgSQL:

Add control knobs for plpgsql's variable resolution behavior, and make the
default be "throw error on conflict", as per discussions.  The GUC variable
is plpgsql.variable_conflict, with values "error", "use_variable",
"use_column".  The behavior can also be specified per-function by inserting
one of
        #variable_conflict error
        #variable_conflict use_variable
        #variable_conflict use_column
at the start of the function body.
 
The 8.5 release notes will need to mention using "use_variable" to retain
backward-compatible behavior, although we should encourage people to migrate
to the much less mistake-prone "error" setting.
 
Update the plpgsql documentation to match this and other recent changes.

One of the most common reasons people have problems with writing functions is that they use the same names for columns and for variables.

Like in this example:

CREATE TABLE users (
    user_id serial PRIMARY KEY,
    username TEXT UNIQUE
);
INSERT INTO users (username) VALUES ('depesz'),('test');

We have table, and now we want to get user_id, based on name (this example is trivial, but it shows effect):

CREATE OR REPLACE FUNCTION get_user_id(username TEXT) RETURNS INT4 as $$
declare
    reply INT4;
BEGIN
    SELECT user_id INTO reply FROM users WHERE username = username;
    RETURN reply;
END;
$$ language plpgsql;

So, what will happen in 8.4 or earlier when I run this function:

# SELECT get_user_id('depesz');
 get_user_id
-------------
           1
(1 row)
 
# SELECT get_user_id('test');
 get_user_id
-------------
           1
(1 row)

The problem lies in the fact that given string “username = username" PostgreSQL simply cannot know if you want to compare column with variable, variable with column, variable with variable or column with column – so it chooses one approach, and it leads to bad results.

And since it is not technically syntax error (all of these 4 cases above are perfectly legal), no error is raised – it's just that the function “doesn't work as expected".

Now, in 8.5 the same code (with default settings) will:

# SELECT get_user_id('depesz');
ERROR:  column reference "user_id" is ambiguous
LINE 1: SELECT user_id              FROM users WHERE username = user...
               ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
QUERY:  SELECT user_id              FROM users WHERE username = username
CONTEXT:  PL/pgSQL function "get_user_id" line 4 at SQL statement

This makes the problem far easier to spot, and thus – to fix.

But. Since this function will, without any doubt, break some existing code, there is now new guc “plpgsql.variable_conflict".

Let's see how it works. First of all – despite being GUC – you shouldn't be setting it in SQL query (like SET …). Intstead you set it in postgresql.conf, ot specify explicitly in function body.

First, let's try postgresql.conf modification:

# show plpgsql.variable_conflict;
 plpgsql.variable_conflict
---------------------------
 use_variable
(1 row)
 
# select get_user_id('depesz'), get_user_id('test');
 get_user_id | get_user_id
-------------+-------------
           1 |           1
(1 row)

Nice. In the same way (changing postgresql.conf) I can make it “use_column".

Ah – just to show how it looks in postgresql.conf:

=> tail -n 3 data/postgresql.conf
#custom_variable_classes = ''           # list of custom variable class names
custom_variable_classes = 'plpgsql'
plpgsql.variable_conflict = use_variable

But this is definitely not good approach. It's much better to make it error out as default, and just change the function we need. So, let's make it explicit that I want error:

=> tail -n 3 data/postgresql.conf
#custom_variable_classes = ''           # list of custom variable class names
custom_variable_classes = 'plpgsql'
plpgsql.variable_conflict = error

pg_ctl reload, but then:

# show plpgsql.variable_conflict;
 plpgsql.variable_conflict
---------------------------
 error
(1 row)
 
# select get_user_id('depesz'), get_user_id('test');
 get_user_id | get_user_id
-------------+-------------
           1 |           1
(1 row)

The problem is (I guess) that variable_conflict is checked by plpgsql language on first function compilation. So, if given function was already compiled in given session, changes to variable_conflict will not modify this particular function behaviour.

So, psql reconnect, and I get my error message:

# select get_user_id('depesz'), get_user_id('test');
ERROR:  column reference "username" is ambiguous
LINE 1: SELECT user_id            FROM users WHERE username = userna...
                                                   ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
QUERY:  SELECT user_id            FROM users WHERE username = username
CONTEXT:  PL/pgSQL function "get_user_id" line 4 at SQL statement

Now, that I have sane default, let's make this particular function work again, by making it's variable resolution use “column" way:

CREATE OR REPLACE FUNCTION get_user_id(username TEXT) RETURNS INT4 as $$
# variable_conflict use_column
declare
    reply INT4;
BEGIN
    SELECT user_id INTO reply FROM users WHERE username = username;
    RETURN reply;
END;
$$ language plpgsql;

And now:

# show plpgsql.variable_conflict;
 plpgsql.variable_conflict
---------------------------
 error
(1 row)
 
(depesz@[local]:5850) 14:50:45 [depesz]
# select get_user_id('depesz'), get_user_id('test');
 get_user_id | get_user_id
-------------+-------------
           1 |           1
(1 row)

Of course the best way to handle these errors it to fix the functions, and not introduce problem-hiding configuration changes.