Getting value from dynamic column in pl/PgSQL triggers?

Every so often, on irc, someone asks how to get value from column that is passed as argument.

This is generally seen as not possible, as pl/PgSQL doesn't have support for dynamic column names.

We can work around it, though. Are the workarounds usable, in terms of performance?

To test it, I will need some sample data. This doesn't have to be big, but let's make it big enough so that I can test performance within Pg itself. Let's say 10 million rows.

=$ CREATE TABLE test (
    id int4 generated always AS IDENTITY PRIMARY KEY,
    payload int4
);
=$ INSERT INTO test (payload)
    SELECT random() * 100000 - 50000 FROM generate_series(1,10000000);

This made me a table with 10M rows, where payload column is a random int from -50k to +50k.

To be able to sensibly compare the performance we will need some baseline. Since I was to measure effect of getting dynamic field only, I will not work using triggers, but instead will make a function that gets payload from row given as argument.

Baseline will be function that will statically get payload column:

=$ CREATE OR REPLACE FUNCTION get_static(IN p_row test) RETURNS INT4 AS $$
DECLARE
BEGIN
    RETURN p_row.payload;
END;
$$ LANGUAGE plpgsql;

I can see it work like this:

=$ SELECT t.*, get_static(t) FROM test t LIMIT 2;
 id | payload | get_static 
----+---------+------------
  1 |   29712 |      29712
  2 |  -32547 |     -32547
(2 ROWS)

OK. Now, let's see how to get the specific value. I will write a set of functions, each testing one approach, that will get the row and name of field, and will return value of the field.

First approach – plain pl/PgSQL, using EXECUTE:

=$ CREATE OR REPLACE FUNCTION get_dynamic_plpgsql(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$
DECLARE
    v_got INT4;
BEGIN
    EXECUTE format('SELECT ($1).%I', p_column) USING p_row INTO v_got;
    RETURN v_got;
END;
$$ LANGUAGE plpgsql;

Sanity check:

=$ SELECT t.*, get_dynamic_plpgsql(t, 'payload') FROM test t WHERE t.id > 10 LIMIT 2;
 id | payload | get_dynamic_plpgsql 
----+---------+---------------------
 11 |   40481 |               40481
 12 |  -17427 |              -17427
(2 ROWS)

This works by building query like: SELECT ($1).payload, and then doing EXECUTE on it, passing whole row (t) as $1.

Next approach is to use hstore:

=$ CREATE extension hstore;
=$ CREATE OR REPLACE FUNCTION get_dynamic_hstore(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$
DECLARE
BEGIN
    RETURN hstore(p_row)->p_column;
END;
$$ LANGUAGE plpgsql;

sanity check:

=$ SELECT t.*, get_dynamic_hstore(t, 'payload') FROM test t WHERE t.id > 30 LIMIT 2;
 id | payload | get_dynamic_hstore 
----+---------+--------------------
 31 |   -6526 |              -6526
 32 |  -34175 |             -34175
(2 ROWS)

This works by first converting the whole row to hstore datatype (which is basically hash/associative array/dictionary), and then extracting single value from it using -> hstore operator.

Next idea – basically the same, but using JSON, and not hstore:

=$ CREATE OR REPLACE FUNCTION get_dynamic_json(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$
DECLARE
BEGIN
    RETURN row_to_json(p_row) ->> p_column;
END;
$$ LANGUAGE plpgsql;
 
=$ SELECT t.*, get_dynamic_json(t, 'payload') FROM test t WHERE t.id > 40 LIMIT 2;
 id | payload | get_dynamic_json 
----+---------+------------------
 41 |   45291 |            45291
 42 |  -37996 |           -37996
(2 ROWS)

Since we have JSON, let's also try jsonb:

=$ CREATE OR REPLACE FUNCTION get_dynamic_jsonb(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$
DECLARE
BEGIN
    RETURN to_jsonb(p_row) ->> p_column;
END;
$$ LANGUAGE plpgsql;
 
=$ SELECT t.*, get_dynamic_jsonb(t, 'payload') FROM test t WHERE t.id > 50 LIMIT 2;
 id | payload | get_dynamic_jsonb 
----+---------+-------------------
 51 |   18689 |             18689
 52 |   19602 |             19602
(2 ROWS)

Finally, one more thing. When someone asks me about how to do it (dynamic column in plpgsql code), I generally suggest to use different language. For me the most natural language is Perl, so let's see:

=$ CREATE extension plperl;
 
=$ CREATE OR REPLACE FUNCTION get_dynamic_perl(test, TEXT) RETURNS INT4 AS $$
my ($test, $field) = @_;
RETURN $test->{$field};
$$ LANGUAGE plperl;
 
=$ SELECT t.*, get_dynamic_perl(t, 'payload') FROM test t WHERE t.id > 60 LIMIT 2;
 id | payload | get_dynamic_perl 
----+---------+------------------
 61 |  -39619 |           -39619
 62 |  -28303 |           -28303
(2 ROWS)

How about performance? I ran each function like this:

=$ SELECT SUM(get_*(...)) FROM test t;

Each call was immediately repeated to get the most benefit from buffer cache. And then I took the better time. Results:

Function: Time: vs. get_static():
get_static 5,802.037 ms
get_dynamic_hstore 9,878.367 ms + 70 %
get_dynamic_json 16,601.156 ms + 186 %
get_dynamic_perl 17,119.276 ms + 195 %
get_dynamic_jsonb 17,600.991 ms + 203 %
get_dynamic_plpgsql 83,334.069 ms + 1,336 %

So, the results are in. Looks that if you want to get dynamic field you have to agree to at least 70% of reduced performance.

Of course – this doesn't meant that your trigger, or some other function/procedure, will take 70% more time.

In your real-life code there will be, probably, other things – some calculations, some checks. And, in case of triggers – there will be non-zero time to actually write the data.

With this in mind – you just have to test your case if it's worth it. But with the numbers above, I can see clearly that if I wanted to get dynamic field, and I'd want to do it in plpgsql – hstore is the fastest way.