Getting value from dynamic column in pl/PgSQL triggers?

Every so often, on irc, someone asks how to get value from column that is passed as argument.

This is generally seen as not possible, as pl/PgSQL doesn't have support for dynamic column names.

We can work around it, though. Are the workarounds usable, in terms of performance?

To test it, I will need some sample data. This doesn't have to be big, but let's make it big enough so that I can test performance within Pg itself. Let's say 10 million rows.

=$ create table test (
    id int4 generated always as identity primary key,
    payload int4
);
=$ insert into test (payload)
    select random() * 100000 - 50000 from generate_series(1,10000000);

This made me a table with 10M rows, where payload column is a random int from -50k to +50k.

To be able to sensibly compare the performance we will need some baseline. Since I was to measure effect of getting dynamic field only, I will not work using triggers, but instead will make a function that gets payload from row given as argument.

Baseline will be function that will statically get payload column:

=$ CREATE OR REPLACE FUNCTION get_static(in p_row test) RETURNS INT4 as $$
declare
BEGIN
    RETURN p_row.payload;
END;
$$ language plpgsql;

I can see it work like this:

=$ select t.*, get_static(t) from test t limit 2;
 id | payload | get_static 
----+---------+------------
  1 |   29712 |      29712
  2 |  -32547 |     -32547
(2 rows)

OK. Now, let's see how to get the specific value. I will write a set of functions, each testing one approach, that will get the row and name of field, and will return value of the field.

First approach – plain pl/PgSQL, using EXECUTE:

=$ CREATE OR REPLACE FUNCTION get_dynamic_plpgsql(in p_row test, in p_column TEXT) RETURNS INT4 as $$
declare
    v_got INT4;
BEGIN
    execute format('SELECT ($1).%I', p_column) using p_row INTO v_got;
    RETURN v_got;
END;
$$ language plpgsql;

Sanity check:

=$ select t.*, get_dynamic_plpgsql(t, 'payload') from test t where t.id > 10 limit 2;
 id | payload | get_dynamic_plpgsql 
----+---------+---------------------
 11 |   40481 |               40481
 12 |  -17427 |              -17427
(2 rows)

This works by building query like: SELECT ($1).payload, and then doing EXECUTE on it, passing whole row (t) as $1.

Next approach is to use hstore:

=$ create extension hstore;
=$ CREATE OR REPLACE FUNCTION get_dynamic_hstore(in p_row test, in p_column TEXT) RETURNS INT4 as $$
declare
BEGIN
    RETURN hstore(p_row)->p_column;
END;
$$ language plpgsql;

sanity check:

=$ select t.*, get_dynamic_hstore(t, 'payload') from test t where t.id > 30 limit 2;
 id | payload | get_dynamic_hstore 
----+---------+--------------------
 31 |   -6526 |              -6526
 32 |  -34175 |             -34175
(2 rows)

This works by first converting the whole row to hstore datatype (which is basically hash/associative array/dictionary), and then extracting single value from it using -> hstore operator.

Next idea – basically the same, but using JSON, and not hstore:

=$ CREATE OR REPLACE FUNCTION get_dynamic_json(in p_row test, in p_column TEXT) RETURNS INT4 as $$
declare
BEGIN
    RETURN row_to_json(p_row) ->> p_column;
END;
$$ language plpgsql;
 
=$ select t.*, get_dynamic_json(t, 'payload') from test t where t.id > 40 limit 2;
 id | payload | get_dynamic_json 
----+---------+------------------
 41 |   45291 |            45291
 42 |  -37996 |           -37996
(2 rows)

Since we have JSON, let's also try jsonb:

=$ CREATE OR REPLACE FUNCTION get_dynamic_jsonb(in p_row test, in p_column TEXT) RETURNS INT4 as $$
declare
BEGIN
    RETURN to_jsonb(p_row) ->> p_column;
END;
$$ language plpgsql;
 
=$ select t.*, get_dynamic_jsonb(t, 'payload') from test t where t.id > 50 limit 2;
 id | payload | get_dynamic_jsonb 
----+---------+-------------------
 51 |   18689 |             18689
 52 |   19602 |             19602
(2 rows)

Finally, one more thing. When someone asks me about how to do it (dynamic column in plpgsql code), I generally suggest to use different language. For me the most natural language is Perl, so let's see:

=$ create extension plperl;
 
=$ CREATE OR REPLACE FUNCTION get_dynamic_perl(test, TEXT) RETURNS INT4 as $$
my ($test, $field) = @_;
return $test->{$field};
$$ language plperl;
 
=$ select t.*, get_dynamic_perl(t, 'payload') from test t where t.id > 60 limit 2;
 id | payload | get_dynamic_perl 
----+---------+------------------
 61 |  -39619 |           -39619
 62 |  -28303 |           -28303
(2 rows)

How about performance? I ran each function like this:

=$ select sum(get_*(...)) from test t;

Each call was immediately repeated to get the most benefit from buffer cache. And then I took the better time. Results:

Function:	Time:	vs. get_static():
get_static	5,802.037 ms	–
get_dynamic_hstore	9,878.367 ms	+ 70 %
get_dynamic_json	16,601.156 ms	+ 186 %
get_dynamic_perl	17,119.276 ms	+ 195 %
get_dynamic_jsonb	17,600.991 ms	+ 203 %
get_dynamic_plpgsql	83,334.069 ms	+ 1,336 %

So, the results are in. Looks that if you want to get dynamic field you have to agree to at least 70% of reduced performance.

Of course – this doesn't meant that your trigger, or some other function/procedure, will take 70% more time.

In your real-life code there will be, probably, other things – some calculations, some checks. And, in case of triggers – there will be non-zero time to actually write the data.

With this in mind – you just have to test your case if it's worth it. But with the numbers above, I can see clearly that if I wanted to get dynamic field, and I'd want to do it in plpgsql – hstore is the fastest way.