March 20th, 2008 by depesz | Tags: , , , | 14 comments »
Did it help? If yes - maybe you can help me?

there are this situations where you need (or you think you need) to get multiple values from sequence.

for example – you're importing lots of data: some objects, some objects which relate to base objects, and so on.

of course you can do it all with simple inserts, and currval, but it would be rather slow:

think about doing this 1000 times:

INSERT INTO objects (..) values (..);
INSERT INTO other (...) VALUES (currval('objects_seq'), ...);

the fastest way would be to “copy" all of these values, but then you would need to know id's of all inserted objects. not really cool. or you would have to assign the ids yourself in importer script/program.

assigning them is actually quite complex. you can't generate your own numbers, because you can have collisions with numbers already used in database (from sequence). so you should simply get all of these numbers from sequence. but how to get 1000 values from sequence?

basic approach is rather simple:

select nextval('seq') from generate_series(1,1000);

looks simple. and is simple.

but it has one quite important “issue". it sends all of these numbers from database to application. this may not sound “really bad", but do we really need to send all of this data? isn't is possible to just “allocate" 1000 ids and send first or last of them? in this way you would also get them consecutive, which might be important in some use cases.

well, actually it's a bit more complex.

technically you can do something like this:

# alter sequence seq increment by 1000;
ALTER SEQUENCE
# select nextval('seq');
nextval
---------
2000
(1 row)
# alter sequence seq increment by 1;
ALTER SEQUENCE

which would work absolutely great, but it has 2 issues:

  • every nextval(‘seq') (even from another connections) that will be called between first and second “alter sequence" will allocate 1000 ids.
  • there is a race condition present that basically forbids usage of this way in multi-user environment

as for first issue. sequences are outside of transactional system. it means that all changes to sequences are also not transactional.

getting “too high" id in concurrent transaction will usually not be a problem, but it just looks “bad".

unfortunately – we can't do anything about it. to fix the issue, we would have to ask those nice guys which know c, to implement nextval(seq, increment) in postgresql itself 🙂

as for second issue. let's think about 2 separate imports which do the “alter, nextval, alter" trick:

importer1 importer2
alter sequence seq increment by 1000;  
select nextval(‘seq');  
  alter sequence seq increment by 1000;
alter sequence seq increment by 1;  
  select nextval(‘seq');
  alter sequence seq increment by 1;

oops. importer 2 got sequence which had increment = 1! so if it assumes that it can use values from (nextval-999) to (nextval), we will have a very likely collision.

luckily, we can (to some extent) get rid of the problem.

since postgresql 8.2, we have this great tool: advisory locks.

with this we can write our queries like:

select pg_advisory_lock(123);
alter sequence seq increment by 1000;
select nextval('seq');
alter sequence seq increment by 1;
select pg_advisory_unlock(123);

of course instead of 123, we can/should use another value, for example based on sequence name, oid or something like this.

what it does? well, our second importer will have to wait (on select pg_advisory_lock()) until first importer will not finish it's job.

so flow will look like this:

importer1 importer2
select pg_advisory_lock(123);  
alter sequence seq increment by 1000;  
  select pg_advisory_lock(123);
select nextval(‘seq'); … waits for lock …
alter sequence seq increment by 1; … waits for lock …
select pg_advisory_unlock(123); … waits for lock …
  alter sequence seq increment by 1000;
  select nextval(‘seq');
  alter sequence seq increment by 1;
  select pg_advisory_unlock(123);

of course writing all of these queries in client app is not really nice, so we can wrap it in nice pl/pgsql function:

CREATE OR REPLACE FUNCTION multi_nextval(use_seqname TEXT, use_increment INT4) RETURNS INT4 AS $$
DECLARE
reply int4;
BEGIN
perform pg_advisory_lock(123);
execute 'ALTER SEQUENCE ' || quote_ident(use_seqname) || ' INCREMENT BY ' || use_increment::text;
reply := nextval(use_seqname);
execute 'ALTER SEQUENCE ' || quote_ident(use_seqname) || ' INCREMENT BY 1';
perform pg_advisory_unlock(123);
return reply;
END;
$$ LANGUAGE 'plpgsql';

this works nicely like:

select multi_nextval('seq', 1000);

of course it still doesn't protects us from nextval in other connection incrementing by 1000, but as i said – i can't fix that in sql itself.

there is just one more problem.

this solution with advisory locks will work as long as all situations where applications gets multiple ids use this.

i.e. if we will have in one part of code proper usage with advisory locks, and in other – direct “alter sequence, nextval, alter sequence" – race condition will be still there.

if you see it as a problem – i would suggest to write a bug to pgsql-bugs list, or at least sending a question to pgsql-general mailing list. it was discussed before, but it never made it to -hackers, or -patches.

until then – you have to agree to some compromises: either get all values (think 1000 ints) with generate_series approach, or use my function, but be warned about it's limitations.

  1. 14 comments

  2. # Steve Haslam
    Mar 20, 2008

    Couldn’t you do:

    create temp table generated(id int);

    insert into generated(id)
    select nextval(‘seq’) from generate_series(1,1000);

    select min(id) as first_id, max(id) as last_id from generated;

    drop table generated;

    ?

  3. Mar 20, 2008

    no, because there is no guarateee that generated numbers will be consecutive.

    i.e. it might happen that you will get:
    10,11,12,15,17,18,19,…

  4. # Steve Haslam
    Mar 20, 2008

    Ah yes, I see what you mean. So maybe a “safe” compromise would be to copy the data to a temp table and then use insert to essentially join the data up with nextval() calls for each row… which presumably would only be faster than inserting them manually because of having less network turnarounds…

  5. Mar 20, 2008

    @Steve Haslam:
    of course i can, but it gets complex as we insert to many tables.

    anyway – yes, it’s a perfectly viable solution for import question.

  6. # maniek
    Mar 20, 2008

    Actually i think (have not checked) that you don`t need the advisory locks in this situation – the alter statement does get an exclusive lock on the sequence till the end of transaction anyway

  7. Mar 20, 2008

    @maniek:
    no, it doesn’t. sequences are outside of transactions.

  8. # przemek tomczyk
    Mar 21, 2008

    and what about “basic approach” modified not to send values from db to client/application:

    select min(nextval(‘seq’)) from generate_series(1,1000);
    ?

  9. # przemek tomczyk
    Mar 21, 2008

    nah, same flaw as in 1st comment 🙂

  10. Mar 21, 2008

    There is one more (little known) alternative to the generate_series() which was discussed in the matching IRC chat 😉
    It is INSERT with the RETURNING clause (apparently available since 8.2):

    INSERT INTO master table VALUES ( … ), ( … ), ( …)
    RETURNING pkey;

    This returns the primary key assigned by the database for each inserted record. Which can then be used to construct subsequent INSERTs in belongs-to style tables (eg addresses of contacts).
    (w/o the RETURNING feature you would need to do individual INSERTs to grab generated pkeys using CURRVAL)

    Basic flow with RETURNING is:
    a) build INSERT statement for master records
    BEGIN;
    b) INSERT master records, RETURNING ids
    c) collect ids and build INSERT statements for child records [O(N)]
    d) INSERT child records
    COMMIT;

    Instead of:
    0) SELECT nextval FROM generate_series [O(N)]
    a) build INSERT statement for master records with IDs
    b) build INSERT statements for child records
    BEGIN;
    c) INSERT master records, INSERT child records
    COMMIT;

    Advantage over generate_series(): one less SQL statement send to server. Disadvantage: transaction (including locks on master) is kept open longer since step c) can take a while for large import sets. The generate_series() runs completely out of TX bounds.

    Anyways, the multi_nextval() is awesome for large sets (O(1), out of TX). Thanks depesz!

  11. # Rui
    Aug 17, 2016

    Maybe another alternative in 9.5

    perform pg_advisory_lock(38212);
    _selo := nextval(‘selo_2tab_seq’);
    perform setval(‘selo_2tab_seq’,_selo+in_qtd-1);
    perform pg_advisory_unlock(38212);

  12. Feb 27, 2018

    As I describe in https://www.cybertec-postgresql.com/en/sequences-gains-and-pitfalls/ PostgreSQL v10 has changed ALTER SEQUENCE so that it creates a new filenode for the sequence in order to attain transaction safety.

    That makes multi_nextval perform badly, since it now has to create a new filenode *twice*.

    I have tried to come up with an improved version.

    I also noticed a bug:
    multi_nextval returns the *last* value of the sequence value range, but when called on a newly created sequence, it will always return the start value. So you should at least call nextval once on the sequence before using it with the function.

  1. 3 Trackback(s)

  2. Mar 18, 2013: Return ids generated via DATA LOAD INFILE | BlogoSfera
  3. Feb 26, 2018: New features for sequences: gains and pitfalls - Cybertec
  4. Feb 27, 2018: Laurenz Albe: New features for sequences: gains and pitfalls – Cloud Data Architect

Leave a comment