October 30th, 2015 by depesz | Tags: , , , , , , , | 4 comments »
Did it help? If yes - maybe you can help me?

Some time ago, guys from Instagram shared their approach to generating unique ids on multiple hosts in a way that guarantees (to reasonable extend) uniqueness, and doesn't require any centralized service.

Earlier this month, the build benchmarked their solution vs. UUIDs, and vs. bigserial.

I thought – whether C based code for nextid would be faster.

My knowledge of C is very limited, so what I did is probably not that nice as it could be, but anyway. Original function from Instagram:

CREATE OR REPLACE FUNCTION next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('table_id_seq') % 1024 INTO seq_id;
 
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;

My function, in C:

#include <sys/time.h>
#include "postgres.h"
 
#include "funcapi.h"
 
PG_MODULE_MAGIC;
 
PG_FUNCTION_INFO_V1(nextid_c);
 
Datum
nextid_c(PG_FUNCTION_ARGS)
{
    int32          shard_id     =  PG_GETARG_INT32(0);
    int64          seq_no       =  PG_GETARG_INT64(1);
    int64          our_epoch    =  1314220021721;
    int64          now_millis;
    int64          result;
    struct timeval tp;
 
    gettimeofday(&tp, NULL);
 
    now_millis = tp.tv_sec * 1000 + tp.tv_usec / 1000 - our_epoch;
 
    result = now_millis  << 23;
    result = result | (shard_id << 10);
    result = result | (seq_no % 1024);
 
    PG_RETURN_INT64(result);
}

This got compiled to nextid_c.so, copied do libdir for PostgreSQL, and function was created using:

CREATE FUNCTION next_id_c(INT4, INT8)
RETURNS int8
AS '$libdir/nextid_c', 'nextid_c'
LANGUAGE C STRICT;

it's important to note that this function takes 2 arguments:

  1. Shard number (I didn't want to hardcode it in source)
  2. Next id from sequence – since I didn't want to write db-access from C

Old function was called:

select next_id();

while new function:

select next_id( 5, nextval('table_id_seq') );

More complex, but I figured it's good enough, and it made it possible not to hardcode name of the sequence.

To test only id generation time, I skipped table creation, and simply did:

SELECT count(*) FROM ( SELECT nextval('table_id_seq' ) FROM generate_series(1, 1000000)) as x;
SELECT count(*) FROM ( SELECT uuid_generate_v4() FROM generate_series(1, 1000000)) as x;
SELECT count(*) FROM ( SELECT next_id() FROM generate_series(1, 1000000)) as x;
SELECT count(*) FROM ( SELECT next_id_c( 5, nextval('table_id_seq' )) FROM generate_series(1, 1000000)) as x;

This showed me time for generating 1 million ids using given approach.

Each approach was ran 10 times. Results:

Method Time (in ms.)
Best Average Worst
next_id 6810.946 7001.082 7117.601
next_id_c 505.491 540.303 580.531
nextval 489.318 534.481 569.947
uuid 3364.151 3409.031 3447.849

Which shows us that nextval is (obviously) fastest, and slowdown for other methods is:

  • +3% of time for next_id_c
  • +588% (or, almost 7 times as slow) for uuid
  • +1292% (or, almost 14 times as slow) for next_id

Results from the build are different, but I think it's because they measured writes to table – which I totally ignored, and just checked how fast is the function in id generation.

Long story short – I would say that given normal usage (write to table) – even plpgsql function will be fast enough, but you might want to move to C based function if you need the most speed out of it.

  1. 4 comments

  2. Oct 30, 2015

    the plpgsql function can be maybe 5x faster if you reduce number of expressions there. result := ; result := result | xxx; result := is expensive when you use it in pretty short repeated functions.

  3. Oct 31, 2015

    5 times faster was too optimistic – I did tests, and the speedup is is about 3x

  4. The LANGUAGE SQL version of this is only +74% slower than nextval().

    CREATE OR REPLACE FUNCTION public.next_id_sql() RETURNS bigint LANGUAGE sql AS $$
    select ((((floor(extract(epoch from clock_timestamp()) * 1000)::bigint) – 1314220021721) << 23) | (5 << 10)) | (nextval('table_id_seq') % 1024)$$;

  5. # PAOLO SAUL
    Nov 9, 2015

    Thanks for the test Depesz. This line’s up with our results as well. We contemplated using Instagram-style IDs, and ended up going for a better alternative. We hard-coded the shard ID into a normal sequence, and just used nextval. It is still all wrapped under a function call, so it is transparent to any client.

Sorry, comments for this post are disabled.