Every so often, on irc, someone asks how to get value from column that is passed as argument.
This is generally seen as not possible, as pl/PgSQL doesn't have support for dynamic column names.
We can work around it, though. Are the workarounds usable, in terms of performance?
To test it, I will need some sample data. This doesn't have to be big, but let's make it big enough so that I can test performance within Pg itself. Let's say 10 million rows.
=$ CREATE TABLE test ( id int4 generated always AS IDENTITY PRIMARY KEY, payload int4 ); =$ INSERT INTO test (payload) SELECT random() * 100000 - 50000 FROM generate_series(1,10000000);
This made me a table with 10M rows, where payload column is a random int from -50k to +50k.
To be able to sensibly compare the performance we will need some baseline. Since I was to measure effect of getting dynamic field only, I will not work using triggers, but instead will make a function that gets payload from row given as argument.
Baseline will be function that will statically get payload column:
=$ CREATE OR REPLACE FUNCTION get_static(IN p_row test) RETURNS INT4 AS $$ DECLARE BEGIN RETURN p_row.payload; END; $$ LANGUAGE plpgsql;
I can see it work like this:
=$ SELECT t.*, get_static(t) FROM test t LIMIT 2; id | payload | get_static ----+---------+------------ 1 | 29712 | 29712 2 | -32547 | -32547 (2 ROWS)
OK. Now, let's see how to get the specific value. I will write a set of functions, each testing one approach, that will get the row and name of field, and will return value of the field.
First approach – plain pl/PgSQL, using EXECUTE:
=$ CREATE OR REPLACE FUNCTION get_dynamic_plpgsql(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$ DECLARE v_got INT4; BEGIN EXECUTE format('SELECT ($1).%I', p_column) USING p_row INTO v_got; RETURN v_got; END; $$ LANGUAGE plpgsql;
Sanity check:
=$ SELECT t.*, get_dynamic_plpgsql(t, 'payload') FROM test t WHERE t.id > 10 LIMIT 2; id | payload | get_dynamic_plpgsql ----+---------+--------------------- 11 | 40481 | 40481 12 | -17427 | -17427 (2 ROWS)
This works by building query like: SELECT ($1).payload, and then doing EXECUTE on it, passing whole row (t) as $1.
Next approach is to use hstore:
=$ CREATE extension hstore; =$ CREATE OR REPLACE FUNCTION get_dynamic_hstore(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$ DECLARE BEGIN RETURN hstore(p_row)->p_column; END; $$ LANGUAGE plpgsql;
sanity check:
=$ SELECT t.*, get_dynamic_hstore(t, 'payload') FROM test t WHERE t.id > 30 LIMIT 2; id | payload | get_dynamic_hstore ----+---------+-------------------- 31 | -6526 | -6526 32 | -34175 | -34175 (2 ROWS)
This works by first converting the whole row to hstore datatype (which is basically hash/associative array/dictionary), and then extracting single value from it using -> hstore operator.
Next idea – basically the same, but using JSON, and not hstore:
=$ CREATE OR REPLACE FUNCTION get_dynamic_json(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$ DECLARE BEGIN RETURN row_to_json(p_row) ->> p_column; END; $$ LANGUAGE plpgsql; =$ SELECT t.*, get_dynamic_json(t, 'payload') FROM test t WHERE t.id > 40 LIMIT 2; id | payload | get_dynamic_json ----+---------+------------------ 41 | 45291 | 45291 42 | -37996 | -37996 (2 ROWS)
Since we have JSON, let's also try jsonb:
=$ CREATE OR REPLACE FUNCTION get_dynamic_jsonb(IN p_row test, IN p_column TEXT) RETURNS INT4 AS $$ DECLARE BEGIN RETURN to_jsonb(p_row) ->> p_column; END; $$ LANGUAGE plpgsql; =$ SELECT t.*, get_dynamic_jsonb(t, 'payload') FROM test t WHERE t.id > 50 LIMIT 2; id | payload | get_dynamic_jsonb ----+---------+------------------- 51 | 18689 | 18689 52 | 19602 | 19602 (2 ROWS)
Finally, one more thing. When someone asks me about how to do it (dynamic column in plpgsql code), I generally suggest to use different language. For me the most natural language is Perl, so let's see:
=$ CREATE extension plperl; =$ CREATE OR REPLACE FUNCTION get_dynamic_perl(test, TEXT) RETURNS INT4 AS $$ my ($test, $field) = @_; RETURN $test->{$field}; $$ LANGUAGE plperl; =$ SELECT t.*, get_dynamic_perl(t, 'payload') FROM test t WHERE t.id > 60 LIMIT 2; id | payload | get_dynamic_perl ----+---------+------------------ 61 | -39619 | -39619 62 | -28303 | -28303 (2 ROWS)
How about performance? I ran each function like this:
=$ SELECT SUM(get_*(...)) FROM test t;
Each call was immediately repeated to get the most benefit from buffer cache. And then I took the better time. Results:
Function: | Time: | vs. get_static(): |
---|---|---|
get_static | 5,802.037 ms | – |
get_dynamic_hstore | 9,878.367 ms | + 70 % |
get_dynamic_json | 16,601.156 ms | + 186 % |
get_dynamic_perl | 17,119.276 ms | + 195 % |
get_dynamic_jsonb | 17,600.991 ms | + 203 % |
get_dynamic_plpgsql | 83,334.069 ms | + 1,336 % |
So, the results are in. Looks that if you want to get dynamic field you have to agree to at least 70% of reduced performance.
Of course – this doesn't meant that your trigger, or some other function/procedure, will take 70% more time.
In your real-life code there will be, probably, other things – some calculations, some checks. And, in case of triggers – there will be non-zero time to actually write the data.
With this in mind – you just have to test your case if it's worth it. But with the numbers above, I can see clearly that if I wanted to get dynamic field, and I'd want to do it in plpgsql – hstore is the fastest way.