August 6th, 2008 by depesz | Tags: , , , , | 3 comments »
Did it help? If yes - maybe you can help me?

Cezio wrote post about removing elements from arrays in PostgreSQL.

Unfortunately his blog engine requires registration before comment, which I don't like, so I decided to comment using my own blogspace.

The approach Cezio showed is pretty cool – having more functions to handle arrays is definitely cool, but it might be better to use sql functions instead of pl/pgsql.

I did a quick test with 2 functions. Pl/PgSQL was taken from Cezio post – I just changed the name, and I wrote my own version in SQL. Sources:

create or replace function array_remove_plpgsql(arr int8[], other_arr int8[]) returns int8[] as $$
declare
out_arr int8[];
el_idx int;
begin
    if arr is null or other_arr is null then
        return arr;
    end if;
    for el_idx in array_lower(arr, 1)..array_upper(arr, 1) loop
    if not arr[el_idx] =any(other_arr) then
        out_arr = array_append(out_arr, arr[el_idx]);
    end if;
    end loop;
    return out_arr;
end;
$$ language plpgsql immutable;
create or replace function array_remove_sql(int8[], int8[]) returns int8[] as $$
SELECT array(select $1[i] from generate_series(array_lower($1,1), array_upper($1,1)) i WHERE $1[i] <> ALL($2));
$$ language sql immutable;

Then I ran it many times. 100000 times to be exact:

explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);

Results:

PL/PgSQL version:

# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=36.095..176.331 rows=100000 loops=1)
 Total runtime: 308.723 ms
(2 rows)
# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=35.962..173.675 rows=100000 loops=1)
 Total runtime: 304.880 ms
(2 rows)
# explain analyze select array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=36.279..175.507 rows=100000 loops=1)
 Total runtime: 308.333 ms
(2 rows)

Pretty stable result, around 305ms.

SQL version:

# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.459..149.143 rows=100000 loops=1)
 Total runtime: 277.871 ms
(2 rows)
# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.429..144.672 rows=100000 loops=1)
 Total runtime: 270.746 ms
(2 rows)
# explain analyze select array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) from generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=0) (actual time=13.511..148.526 rows=100000 loops=1)
 Total runtime: 274.809 ms
(2 rows)

Not bad. PlPgSQL is only 10% slower, while being more readable. Is the trade-off worth it – you have to decide for yourself.

Additionally, one factual correction. Cezio said “Jeśli używacie PostgreSQL'a w wersji >=8.3, to w contribie jest pakiet intarray" which means “If you use PostgreSQL 8.3 or newer, there is contrib module intarray".

Actually – intarray has been in contrib since at least 7.3. But Cezio is 100% right – it's definitely worth checking, as it is simply awesome for handling int4[].

  1. 3 comments

  2. Aug 7, 2008

    Difference between plpgsql and sql will be bigger with larger arrays (array update is expensive operation). for n=10000 is plpgsql very bad choice.

  3. On my machine with Postgres 8.2 the plpgsql version of your example runs 25% faster.

  4. # flexpadawan
    Apr 22, 2009

    Great Post! Been trying to figure out how to do this for the last 3 days!

    I did some testing and tweaking.

    The Test on postgres 8.3:
    306.370ms – plpgsql (Winner) AMD Quad-Core??
    1849.881ms – sql

    The Tweek:
    I re-wrote this to allow for any data type.

    CREATE OR REPLACE FUNCTION array_difference(anyarray, anyarray)
    RETURNS anyarray AS
    $BODY$
    declare
    out_arr $1%TYPE;
    el_idx int;
    begin
    if $1 is null or $2 is null then
    return $1;
    end if;
    for el_idx in array_lower($1, 1)..array_upper($1, 1) loop
    if not $1[el_idx] =any($2) then
    out_arr = array_append(out_arr, $1[el_idx]);
    end if;
    end loop;
    return out_arr;
    end;
    $BODY$
    LANGUAGE ‘plpgsql’ IMMUTABLE
    COST 100;
    ALTER FUNCTION array_difference(anyarray, anyarray) OWNER TO postgres;

Leave a comment