March 18th, 2015 by depesz | Tags: , , , , , , | 1 comment »
Did it help? If yes - maybe you can help me?

On 18th of March, Alvaro Herrera committed patch:

array_offset() and array_offsets()
These functions return the offset position or positions of a value in an
Author: Pavel Stěhule
Reviewed by: Jim Nasby

It's been a while since my last “waiting for" post – mostly because while there is a lot of work happening, a lot of optimizations, bugfixes and new functionalities in the PostgreSQL, not all that much is easily visible from SQL level.

So, today we got two new functions. I was missing them for a very long time, and never really understood how come we didn't have them. But now, we finally do.

Basically, these are for finding at what position given value is in array, like:

select array_offset( '{a,b,c,d,s,a,c,a,r,z}'::text[], 'a' );
(1 row)
select array_offsets( '{a,b,c,d,s,a,c,a,r,z}'::text[], 'a' );
(1 row)

The difference seems to be obvious, usecases too.

A word of warning, though. In various programs it's pretty common to test for existence of value in array by checking its position.

So one could be tempted to use array_offset(s) to test if given value exists in the array. Which is terrible mistake, as this is not indexable operation:

$ create table test (id serial primary key, some_values int4[] );
$ insert into test (some_values) select array[ (random()*1000)::int4, (random()*1000)::int4, (random()*1000)::int4, (random()*1000)::int4, (random()*1000)::int4] from generate_series(1,1000000);
INSERT 0 1000000
$ create index test_idx on test using gin (some_values);
$ explain analyze select * from test where array_offset(some_values, 123) is not null;
                                                 QUERY PLAN                                                  
 Seq Scan on test  (cost=0.00..21846.12 rows=995010 width=45) (actual time=0.199..178.924 rows=4924 loops=1)
   Filter: (array_offset(some_values, 123) IS NOT NULL)
   Rows Removed by Filter: 995086
 Planning time: 0.380 ms
 Execution time: 179.111 ms
(5 rows)

Instead, what you should do is use proper array operators for finding the rows, and only then extract the position, if you need it:

$ explain analyze select *, array_offset(some_values, 123) from test where some_values @> '{123}'::int4[];
                                                       QUERY PLAN                                                       
 Bitmap Heap Scan on test  (cost=52.43..8003.66 rows=4700 width=45) (actual time=1.607..13.749 rows=4924 loops=1)
   Recheck Cond: (some_values @> '{123}'::integer[])
   Heap Blocks: exact=3825
   ->  Bitmap Index Scan on test_idx  (cost=0.00..51.25 rows=4700 width=0) (actual time=0.989..0.989 rows=4924 loops=1)
         Index Cond: (some_values @> '{123}'::integer[])
 Planning time: 0.433 ms
 Execution time: 14.029 ms
(7 rows)

Looks pretty useful, and I'll definitely will be playing with it.

  1. One comment

  2. May 1, 2015

    The name was changed to array_position(s)

Leave a comment