Waiting for 8.4 – pl/* srf functions in selects

On 28th of October Tom Lane committed his patch that changes some internals of functions, but it also adds interesting capability.

Commit message:

Extend ExecMakeFunctionResult() to support set-returning functions that return
via a tuplestore instead of value-per-call.  Refactor a few things to reduce
ensuing code duplication with nodeFunctionscan.c.  This represents the
reasonably noncontroversial part of my proposed patch to switch SQL functions
over to returning tuplestores.  For the moment, SQL functions still do things
the old way.  However, this change enables PL SRFs to be called in targetlists
(observe changes in plperl regression results).

What it does. As you perhaps know you can't join table and function. So, if you had function, which returned > 1 row (or > 1 column), you couldn't simply call them giving it as argument field from table that is being used in query.

I think the description is “a bit" complicated, so let's just go straight to example.

I have a function, which, given 2 integers, returns all integers between them, plus some textual field:

CREATE OR REPLACE FUNCTION test(
    IN from_i INT4,
    IN to_i INT4,
    OUT numerical INT4,
    OUT textual TEXT
) RETURNS setof record AS $$
DECLARE
i INT4;
BEGIN
    FOR i IN from_i .. to_i LOOP
        numerical := i;
        textual := 'i = ' || i;
        RETURN NEXT;
    END loop;
    RETURN;
END;
$$ LANGUAGE plpgsql;

Of course this function is pretty dummy, but it does the job:

# SELECT * FROM test(2,5);
 numerical | textual
-----------+---------
         2 | i = 2
         3 | i = 3
         4 | i = 4
         5 | i = 5
(4 ROWS)

Now, let's assume we want (for some insane reason) to get this rows for multiple sets of (from_i, to_i). For example, for these sets:

# SELECT 1 AS from_i, i AS to_i FROM generate_series(1,3) i;
 from_i | to_i
--------+------
      1 |    1
      1 |    2
      1 |    3
(3 ROWS)

Well, generally – I'm out of luck. I can't join generate_series() and my function. If my function was in SQL, I could write it as

SELECT i, test(...) FROM

but I wrote it in pl/PgSQL. Of course I could write a SQL wrapper, but it doesn't look cool. Luckily, with this new patch, pl/PgSQL (and other pl/*) languages can be called just like SQL function:

# SELECT i, test(1, i) FROM generate_series(1,3) i;
 i |    test
---+-------------
 1 | (1,"i = 1")
 2 | (1,"i = 1")
 2 | (2,"i = 2")
 3 | (1,"i = 1")
 3 | (2,"i = 2")
 3 | (3,"i = 3")
(6 ROWS)

But what if I'd like to get the columns separately?

Well, I could write it as:

SELECT i, (test(1, i)).numerical, (test(1,i)).textual FROM generate_series(1,3) i;

But then the test() function would get called twice for every row in generate_series output. Which could be a problem if test() would do anything more interesting that simple loop.

Luckily, we can do it with simple subselect:

# SELECT i, (test).numerical, (test).textual
    FROM (SELECT i, test(1, i) FROM generate_series(1,3) i) x;
 i | numerical | textual
---+-----------+---------
 1 |         1 | i = 1
 2 |         1 | i = 1
 2 |         2 | i = 2
 3 |         1 | i = 1
 3 |         2 | i = 2
 3 |         3 | i = 3
(6 ROWS)

Works. Great. Another long standing issue solved.

*EDIT*

Since I was not able to put query and explain analyze outputs in comments, I'll put it in here:

CTE query:

WITH SOURCE AS (
   SELECT i, test(1, i) FROM generate_series(1,3) i
)
SELECT i, (test).numerical, (test).textual FROM SOURCE;

Explain analyze outputs:

CTE:

                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 CTE Scan on source  (cost=262.50..282.50 rows=1000 width=36) (actual time=2.331..2.534 rows=6 loops=1)
   InitPlan
     ->  Function Scan on generate_series i  (cost=0.00..262.50 rows=1000 width=4) (actual time=2.315..2.488 rows=6 loops=1)
 Total runtime: 2.646 ms
(4 rows)

subquery:

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Subquery Scan x  (cost=0.00..272.50 rows=1000 width=36) (actual time=0.236..0.423 rows=6 loops=1)
   ->  Function Scan on generate_series i  (cost=0.00..262.50 rows=1000 width=4) (actual time=0.229..0.395 rows=6 loops=1)
 Total runtime: 0.477 ms
(3 rows)

6 thoughts on “Waiting for 8.4 – pl/* srf functions in selects”

  1. Hubert,

    This is neat. I’ve been waiting for something like this so I don’t have to do an SQL wrapper hack.

    One question. You sure the function currently gets called twice? I haven’t really tested so can’t be sure.

    I was always under the impression that if you set the function to STABLE/IMMUTABLE, then it would only get called once per unique test(1, i). Otherwise why wouldn’t you be able to put 2 set returning functions in the SELECT and also get multiplicative?

    Thanks,
    Regina

  2. @Regina:
    Just tested, with function defined as stable or immutable. I checked how many times function was called by adding “raise notice” in it. in all cases it got called twice (i.e. in case of subselect approach it was only once per row).

  3. Hubert,

    Since you’re talking about 8.4 anyhow, you might include CTEs in that last example, as this is one of the places they do wonders for 🙂

  4. @David:
    heh, CTEs are so new that I simply forgot about them, but here is goes:

    with source as (
    select i, test(1, i) from generate_series(1,3) i
    )
    select i, (test).numerical, (test).textual from source;

    but i’m not really sure if this will give better results than the subquery approach. explains look like this:

    cte:
    QUERY PLAN
    —————————————————————————————————————————–
    CTE Scan on source (cost=262.50..282.50 rows=1000 width=36) (actual time=0.217..0.426 rows=6 loops=1)
    InitPlan
    -> Function Scan on generate_series i (cost=0.00..262.50 rows=1000 width=4) (actual time=0.206..0.382 rows=6 loops=1)
    Total runtime: 0.483 ms
    (4 rows)

    subquery:
    QUERY PLAN
    —————————————————————————————————————————
    Subquery Scan x (cost=0.00..272.50 rows=1000 width=36) (actual time=0.200..0.388 rows=6 loops=1)
    -> Function Scan on generate_series i (cost=0.00..262.50 rows=1000 width=4) (actual time=0.193..0.358 rows=6 loops=1)
    Total runtime: 0.437 ms
    (3 rows)

  5. @Sergey:
    # select i, (test(1, i)).* from generate_series(1,3) i;
    i | numerical | textual
    —+———–+———
    1 | 1 | i = 1
    2 | 1 | i = 1
    2 | 2 | i = 2
    3 | 1 | i = 1
    3 | 2 | i = 2
    3 | 3 | i = 3
    (6 rows)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.