Removing elements from arrays

Cezio wrote post about removing elements from arrays in PostgreSQL.

Unfortunately his blog engine requires registration before comment, which I don't like, so I decided to comment using my own blogspace.

The approach Cezio showed is pretty cool – having more functions to handle arrays is definitely cool, but it might be better to use sql functions instead of pl/pgsql.

I did a quick test with 2 functions. Pl/PgSQL was taken from Cezio post – I just changed the name, and I wrote my own version in SQL. Sources:

CREATE OR REPLACE FUNCTION array_remove_plpgsql(arr int8[], other_arr int8[]) RETURNS int8[] AS $$
DECLARE
out_arr int8[];
el_idx INT;
BEGIN
    IF arr IS NULL OR other_arr IS NULL THEN
        RETURN arr;
    END IF;
    FOR el_idx IN array_lower(arr, 1)..array_upper(arr, 1) loop
    IF NOT arr[el_idx] =any(other_arr) THEN
        out_arr = array_append(out_arr, arr[el_idx]);
    END IF;
    END loop;
    RETURN out_arr;
END;
$$ LANGUAGE plpgsql immutable;
CREATE OR REPLACE FUNCTION array_remove_sql(int8[], int8[]) RETURNS int8[] AS $$
SELECT array(SELECT $1[i] FROM generate_series(array_lower($1,1), array_upper($1,1)) i WHERE $1[i] <> ALL($2));
$$ LANGUAGE SQL immutable;

Then I ran it many times. 100000 times to be exact:

EXPLAIN analyze SELECT array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
EXPLAIN analyze SELECT array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);

Results:

PL/PgSQL version:

# EXPLAIN analyze SELECT array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=36.095..176.331 ROWS=100000 loops=1)
 Total runtime: 308.723 ms
(2 ROWS)
# EXPLAIN analyze SELECT array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=35.962..173.675 ROWS=100000 loops=1)
 Total runtime: 304.880 ms
(2 ROWS)
# EXPLAIN analyze SELECT array_remove_plpgsql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=36.279..175.507 ROWS=100000 loops=1)
 Total runtime: 308.333 ms
(2 ROWS)

Pretty stable result, around 305ms.

SQL version:

# EXPLAIN analyze SELECT array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=13.459..149.143 ROWS=100000 loops=1)
 Total runtime: 277.871 ms
(2 ROWS)
# EXPLAIN analyze SELECT array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=13.429..144.672 ROWS=100000 loops=1)
 Total runtime: 270.746 ms
(2 ROWS)
# EXPLAIN analyze SELECT array_remove_sql(array[1,2,3,4,5,6,7,8,9,10]::int8[],array[9,7,5,3,1]::int8[]) FROM generate_series(1,100000);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 FUNCTION Scan ON generate_series  (cost=0.00..12.50 ROWS=1000 width=0) (actual TIME=13.511..148.526 ROWS=100000 loops=1)
 Total runtime: 274.809 ms
(2 ROWS)

Not bad. PlPgSQL is only 10% slower, while being more readable. Is the trade-off worth it – you have to decide for yourself.

Additionally, one factual correction. Cezio said “Jeśli używacie PostgreSQL'a w wersji >=8.3, to w contribie jest pakiet intarray" which means “If you use PostgreSQL 8.3 or newer, there is contrib module intarray".

Actually – intarray has been in contrib since at least 7.3. But Cezio is 100% right – it's definitely worth checking, as it is simply awesome for handling int4[].

3 thoughts on “Removing elements from arrays”

  1. Difference between plpgsql and sql will be bigger with larger arrays (array update is expensive operation). for n=10000 is plpgsql very bad choice.

  2. Great Post! Been trying to figure out how to do this for the last 3 days!

    I did some testing and tweaking.

    The Test on postgres 8.3:
    306.370ms – plpgsql (Winner) AMD Quad-Core??
    1849.881ms – sql

    The Tweek:
    I re-wrote this to allow for any data type.

    CREATE OR REPLACE FUNCTION array_difference(anyarray, anyarray)
    RETURNS anyarray AS
    $BODY$
    declare
    out_arr $1%TYPE;
    el_idx int;
    begin
    if $1 is null or $2 is null then
    return $1;
    end if;
    for el_idx in array_lower($1, 1)..array_upper($1, 1) loop
    if not $1[el_idx] =any($2) then
    out_arr = array_append(out_arr, $1[el_idx]);
    end if;
    end loop;
    return out_arr;
    end;
    $BODY$
    LANGUAGE ‘plpgsql’ IMMUTABLE
    COST 100;
    ALTER FUNCTION array_difference(anyarray, anyarray) OWNER TO postgres;

Comments are closed.