February 25th, 2013 by depesz | Tags: , , , , , , , | 1 comment »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

One of the questions that gets asked quite a lot, is: how can you use variables in your queries? Other databases have them, does PostgreSQL?

Actually the answer is “no". But, it is actually very easy to work around, and what's more important – quite often, what you need are not variables in queries, but rather variables in client. What does it mean? Let's see.

It all depends on your usecase. We have, basically these options:

  • Variables in psql (the command line tool)
  • GUCs
  • Per session variables in temp tables
  • Per session and global variables in normal tables

Variables in psql

In quite a lot of cases, these will be enough for your usage. These are not evaluated in PostgreSQL backend, but in psql client.

Example usage:

$ \prompt 'How many rows? ' rows
How many rows? 10
 
$ select relname from pg_class limit :rows;
             relname
---------------------------------
 pg_statistic
 pg_type
 pg_toast_2619
 pg_toast_2619_index
 pg_authid_rolname_index
 pg_authid_oid_index
 pg_attribute_relid_attnam_index
 pg_attribute_relid_attnum_index
 pg_toast_1255
 pg_toast_1255_index
(10 rows)

Such variables can be set by using \prompt, or by using \set, or by running shell commands (on client, not on server). Afterwards you can use them, in queries, using notations like:

  • :var – simply inserts value of var in the query at given location
  • :'var' – inserts the value as properly quoted string – even if it includes ‘ characters
  • :"var" – inserts the value as properly quoted identifier

So, you can, for example, write something like this:

$ \set table pg_class
$ \set match oid
$ \set limit 10
$ select relname from :"table" where relname ~ :'match' limit :limit;
         relname
--------------------------
 pg_authid_oid_index
 pg_am_oid_index
 pg_amop_oid_index
 pg_amproc_oid_index
 pg_aggregate_fnoid_index
 pg_cast_oid_index
 pg_collation_oid_index
 pg_database_oid_index
 pg_proc_oid_index
 pg_operator_oid_index
(10 rows)

In PostgreSQL 9.3, we'll get the ability to set value of variable based on a query, which will let us do some cooler stuff, in simpler way than before (it is possible to set value of variable from a query in older versions too, but the way of doing this is not nice – example is in the blog post).

So, if you only need some shared value between multiple queries that you run in batch mode via psql – it's perfectly possible, and very simple.

The cool thing about psql variables, is that you can build whole queries in them. For example:

=$ \set top 'select now() - query_start as runtime, query from pg_stat_activity where state <> ''idle'' order by query_start;'
 
=$ :top

After pressing enter in the :top command, you'll getr information about queries that run.

GUCs

You might know that there are various Pg settings – like work_mem, shared_buffers, listen_addresses and so on.

What you might not know is that you can relatively easily setup your own variables. To do so, you need to use some kind of prefix (class). In PostgreSQL before 9.2, you had to configure the class in postgresql.conf, using:

custom_variable_classes = depesz

So you can use “depesz" class. In 9.2 and later classes are defined on use.

Usage of these variables is very simple – using functions. There are two functions that you'll need:

  • current_setting(variable) – returns current value for given variable, raises exception on unknown variable
  • set_config(variable, value, is_local) – sets variable to given value (and declares it if it wasn't declared before). If is_local is set to true, the change is only for current transaction. Otherwise – it's for current session.

So, let's see how it works. First, I'll try to use unknown variable:

$ select current_setting('depesz.xx');
ERROR:  unrecognized configuration parameter "depesz.xx"

OK. That worked. Now, let's set the value:

$ select set_config('depesz.xx', '123', false);
 set_config
------------
 123
(1 row)
 
$ select current_setting('depesz.xx');
 current_setting
-----------------
 123
(1 row)

Couple of important notes:

  • variable cannot contain null. If you'll store null in variable, or if it becomes declared without value – it will have empty string as value
  • if you'll use is_local set to true, and you're in AutoCommit mode – this is basically no-op (side effect will be declaring of the variable, but it will have empty string as value). This did bite me couple of times :)
  • you can set default values per user and per database using normal “alter user/database set … = …"

There is (as far as I know) no way to use GUCs to store real global variable – across multiple sessions. The problem is that while you can use “ALTER … SET …", it is evaluated on connection, so it will not affect existing connections. This might be good for you, though.

Value of variable can be also fetched using “SHOW" sql command, and set using “SET" command, but since using function interface is simpler in case of writing my own functions, I tend to use functions to read/write the variables.

Per session variables in temp tables

In case you're on PostgreSQL 9.1 (or earlier), and you can't modify postgresql.conf to add your own custom_variable_classes, you can do more or less the same thing using some very simple functions.

Using temp tables makes it very fast, and there is no need to care about identification of session – you just get what you need.

For example, we can imagine two simple functions, like this:

CREATE OR REPLACE FUNCTION set_variable( IN p_var TEXT, IN p_val TEXT ) RETURNS void as $$
DECLARE
    v_var TEXT;
BEGIN
    execute 'CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );';
    LOOP
        execute 'UPDATE sys_variables SET value = $1 WHERE variable = $2 returning variable' INTO v_var USING p_val, p_var;
        IF v_var IS NOT NULL THEN
            RETURN;
        END IF;
        BEGIN
            execute 'INSERT INTO sys_variables ( variable, value ) VALUES ( $1, $2 )' USING p_var, p_val;
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- ignore, re-process the loop
        END;
    END LOOP;
END;
$$ language plpgsql;
 
CREATE OR REPLACE FUNCTION get_variable( IN p_var TEXT ) RETURNS TEXT as $$
DECLARE
    v_val TEXT;
BEGIN
    execute 'CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );';
    execute 'SELECT value FROM sys_variables WHERE variable = $1' INTO v_val USING p_var;
    RETURN v_val;
END;
$$ language plpgsql;

And then you can use it like:

$ select set_variable('xx', 'qq');
 set_variable
--------------
 
(1 row)

and

$ select get_variable('xx');
NOTICE:  relation "sys_variables" already exists, skipping
CONTEXT:  SQL statement "CREATE temp TABLE IF NOT exists sys_variables ( variable TEXT PRIMARY KEY, value TEXT );"
PL/pgSQL function get_variable(text) line 5 at EXECUTE statement
 get_variable
--------------
 qq
(1 row)

There is a problem with NOTICEs, though. It can be alleviated in couple of ways:

  • move the “create table" to separate function, and call it just once, right after connecting to database
  • add some kind of logic that will check if the temp table exists, and run ‘create temp table' only if it doesn't
  • set client_min_messages = warning; to hide those notices

Per session and global variables in normal tables

This is the most complicated, but, on the other hand – the most versatile approach.

For starters, we will need real table – let's assume it's sys_variables, again. And we'll need to create it. Suggested idea:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
CREATE TABLE sys_variables (
    variable TEXT,
    value TEXT,
    backend_pid INT4,
    backend_start timestamptz
);
CREATE UNIQUE INDEX sys_variables_global_variable on sys_variables (variable) WHERE backend_pid IS NULL;
CREATE UNIQUE INDEX sys_variables_session_variable on sys_variables (variable, backend_pid, backend_start) WHERE backend_pid IS NOT NULL;
 
CREATE OR REPLACE FUNCTION get_variable( IN p_variable TEXT ) RETURNS TEXT as $$
DECLARE
    v_value TEXT;
BEGIN
    SELECT sv.value INTO v_value FROM pg_stat_get_activity(pg_backend_pid()) as x join sys_variables sv on x.pid = sv.backend_pid AND x.backend_start = sv.backend_start WHERE backend_pid IS NOT NULL AND variable = p_variable;
    IF FOUND THEN
        RETURN v_value;
    END IF;
    SELECT sv.value INTO v_value FROM sys_variables sv WHERE backend_pid IS NULL AND variable = p_variable;
    IF FOUND THEN
        RETURN v_value;
    END IF;
    RETURN NULL;
END;
$$ language plpgsql stable;
 
CREATE OR REPLACE FUNCTION set_local_variable( IN p_variable TEXT, IN p_value TEXT ) RETURNS void as $$
DECLARE
    v_data record;
BEGIN
    SELECT pid, backend_start INTO v_data FROM pg_stat_get_activity(pg_backend_pid());
    LOOP
        UPDATE sys_variables SET value = p_value WHERE variable = p_variable AND backend_pid IS NOT NULL AND backend_pid = v_data.pid AND backend_start = v_data.backend_start;
        IF FOUND THEN
            RETURN;
        END IF;
        BEGIN
            INSERT INTO sys_variables ( variable, value, backend_pid, backend_start ) VALUES ( p_variable, p_value, v_data.pid, v_data.backend_start );
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- ignore, re-process the loop
        END;
    END LOOP;
    RETURN;
END;
$$ language plpgsql;
 
CREATE OR REPLACE FUNCTION set_global_variable( IN p_variable TEXT, IN p_value TEXT ) RETURNS void as $$
DECLARE
BEGIN
    LOOP
        UPDATE sys_variables SET value = p_value WHERE variable = p_variable AND backend_pid IS NULL;
        IF FOUND THEN
            RETURN;
        END IF;
        BEGIN
            INSERT INTO sys_variables ( variable, value, backend_pid, backend_start ) VALUES ( p_variable, p_value, NULL, NULL );
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- ignore, re-process the loop
        END;
    END LOOP;
    RETURN;
END;
$$ language plpgsql;
 
CREATE OR REPLACE FUNCTION sys_variables_cleanup( ) RETURNS void as $$
DECLARE
BEGIN
    DELETE FROM sys_variables WHERE backend_pid IS NOT NULL AND (backend_pid, backend_start) NOT in ( SELECT a.pid, a.backend_start FROM pg_stat_activity as a );
    RETURN;
END;
$$ language plpgsql;

Is it scary? Don't worry, it's actually pretty simple thing. Let's see it step by step.

First part – lines 1-8 create the storage table, and unique indexes. We need two unique indexes – one for global variables (not assigned to any session, thus having backend_pid NULL, and session variables, assigned to some backend. Theoretically we'd need only backend_pid, but the problem is that in system with lots of connections the pid could be reused rather quickly. So I add also backend_start time, which has microsecond precision, and together with pid give me truly unique session identifier.

Next, lines 10-24, the getter. Inside, it first (line 14) checks if, for current session, there is session variable of given name. Data for session identification are taken from pg_stat_get_activity() function – the same that is used by commonly used pg_stat_activity view.

If such variable exists – the function returns it, and ends. If it doesn't – it then checks for global variable with the same name. Simple.

Functions set_local_variable (lines 26-45) and set_global_variable (lines 47-64) do basically the same – insert or update of a row, with proper values in variable, backend_pid and backend_start columns. Nothing worth talking about in here.

Last function is cleanup. The problem is that we can't delete session variables once the session ends – there are no “on disconnect" triggers. So, after some use of such system, we would have quite a lot of old session variables. Solution is to call this function ( sys_variables_cleanup – lines 66-72 ) from crontab. This removes old, obsolete rows.

Of course, this can get further improvement – for example, you might want to consider usage of unlogged tables. Or, perhaps, move session variables to temp tables. Or add a function that would remove session variable, so you could get value of global one. But this is not the really critical stuff.

Summary

So, the final question is: which one is the best? And if I can't answer it, which one do I use?

Well, there is no “best" solution. All solutions have benefits and drawbacks. And as for what I use – I don't. Really. When I need for writing quick batch scripts, I use psql variables, but that happens rarely. And if I need something in Pg itself, I simple end up writing a function anyway, and use variables in functions (or in DO blocks).

All things said – this blogpost does not describe all possible ways. For example, if you're writing a lot of functions in PL/Perl (or other PL/* languages, but not PL/PgSQL) – there is some kind of shared (between queries, within session) memory. In PL/Perl it's %_SHARED, in PL/Python it's SD, in other languages it's probably named differently.

  1. One comment

  2. # Valentine Gogichashvili
    Mar 6, 2013

    Hi, if you have a slave machine with a big shared_buffers value, then creating and dropping temporary table leads to slowdown of WAL replay. So be careful with this approach (it leads to full sharded_buffers scan on slave every time the temp table is dropped). So if you use such temp tables very very often, that can influence your replication very negatively.

Leave a comment