June 14th, 2016 by depesz | Tags: , , , , , | 22 comments »
Did it help? If yes - maybe you can help me?

Some time ago someone on irc asked about creating fast counters for something (banners I think).

I talked with her (him?) about it, but figured, I can as well write a blogpost, so others can use it too.

First the general idea: we want to store incrementable counters, with optional trade off that “we can lose last x minutes of data if it will make things faster".

First approach is pretty simple, create table:

$ create table counters (
    counter_name text primary key,
    current_value int8 not null default 0
);

And then just insert whenever we see new counter, and then update when needed.

There is a problem with this approach though.

Let's assume that our counter-updating-application does something like this:

$ begin;
$ update counters set current_value = current_value + 1 where counter_name = 'whatever';
$ commit;

It might also do something else, before, or after, the update, and it might even be not in database.

Do you see the problem? If not, consider what will happen, in concurrent environment, if the transaction would look like:

$ begin;
$ update counters set current_value = current_value + 1 where counter_name = 'whatever';
sleep for 0.5 seconds
$ commit;

All other updates to this counter will wait for this particular transaction to commit.

It's even worse if we're updating multiple counters in the same transaction, because if one transaction will try to do:

$ begin;
$ update counters set current_value = current_value + 1 where counter_name = 'whatever';
$ update counters set current_value = current_value + 1 where counter_name = 'else';
$ commit;

and another will do:

$ begin;
$ update counters set current_value = current_value + 1 where counter_name = 'else';
$ update counters set current_value = current_value + 1 where counter_name = 'whatever';
$ commit;

then there will be deadlock, and one of them will die.

Of course you might say: my code will never sleep. And it will never update multiple counters. Sure. But there are things you can't control – your process might lock on something outside of database. Or get killed. Or there could be issues with network.

Long story short – locking approach will, sooner or later, cause problems.

So, let's see how it work.

I made 10 sql files, each doing 1000 updates to 10 counters, in random order.

Files look like:

=$ head -n 5 update-01.sql
update counters set current_value = current_value + 1 where counter_name = '4';
update counters set current_value = current_value + 1 where counter_name = '8';
update counters set current_value = current_value + 1 where counter_name = '1';
update counters set current_value = current_value + 1 where counter_name = '1';
update counters set current_value = current_value + 1 where counter_name = '4';

counters are named “0" to “9".

So, let's run it, and see how long it will take:

$ time /bin/ls -1 update-??.sql | xargs -n1 -P10 psql -qAtX -f
real    0m20.181s
user    0m0.248s
sys     0m0.072s

OK. So it took ~ 20 seconds. Counters are:

select * from counters;
 counter_name | current_value 
--------------+---------------
 7            |          1064
 9            |           955
 3            |          1034
 6            |           964
 5            |           973
 4            |          1031
 2            |          1010
 0            |           942
 1            |           975
 8            |          1052
(10 rows)

One thing worth mentioning is that the table got pretty bloated:

$ \dt+ counters
                     List of relations
 Schema |   Name   | Type  | Owner  |  Size  | Description 
--------+----------+-------+--------+--------+-------------
 public | counters | table | depesz | 560 kB | 
(1 row)

560kB just to store 10 counters. But that's side effect of updates.

So, what are other options?

There is very simple option – don't update, just insert. Inserts generally don't lock.

To make it work we need additional table:

$ create table counters_queue (
    id serial primary key,
    counter_name text,
    value_change int4
);

Now, instead of:

$ update counters set current_value = current_value + 1 where counter_name = ?;

we'd use:

$ insert into counters_queue (counter_name, value_change) values (?, 1);

Running the same counter increments using inserts is faster:

=$ time /bin/ls -1 insert-??.sql | xargs -n1 -P10 psql -qAtX -f 
real    0m12.495s
user    0m0.224s
sys     0m0.084s

But what's more important – it is virtually lock free.

Obviously – this leads to question – ok, but how do I get current counter?

Well, we need a thing (cronjob) that will update the counters. This can be done using such simple function:

$ CREATE OR REPLACE FUNCTION apply_counter_queue( IN p_limit INT4 DEFAULT 1000 ) RETURNS INT8 as $$
with id_range as (
    SELECT min(id) as min_id, min(id) + p_limit as max_id FROM counters_queue
), deleted_rows as (
    DELETE FROM counters_queue WHERE id between (SELECT min_id FROM id_range) AND (SELECT max_id FROM id_range) returning *
), summarized_change as (
    SELECT counter_name, sum(value_change) FROM deleted_rows group BY counter_name HAVING sum(value_change) <> 0
), update_counters as (
    UPDATE counters as c SET current_value = c.current_value + s.sum FROM summarized_change s WHERE c.counter_name = s.counter_name returning c.counter_name
)
SELECT count(*) FROM update_counters;
$$ language 'sql';

This function does:

  • (id_range) – gets minimal id from queue, and it's increment by given parameter (defaults to 1000)
  • (deleted_rows) – deletes rows from queue that have id between values from id_range, returning deleted rows, so we can work on them
  • (summarized_change) – calculates sum of value_changes for all deleted rows, and filters out counters which wouldn't be changed (sum of value_change is 0)
  • (update_counters) – updates the counters, returning name of each updated counter
  • returns number of updated counters

for 10000 rows in queue, this function took ~ 40ms to process them! Counters table got updated only once for each counter, so it wasn't bloated.

This function – apply_counter_queue() should be called from cronjob, or some other scheduler.

Now, if you want to have current real counter value, including both what is in counters table, and not-yet-applied in queue, then you need:

$ select
    current_value + coalesce( (select sum(value_change) from counters_queue where counter_name = 'whatever' ), 0 )
    from counters where counter_name = 'whatever'

It's more complex, but with index on counter_name column in counters_queue, it will be fast.

Additionally, if speed of inserts to queue will become a problem, you might want to consider two things:

  • make queue table unlogged
  • batch inserts to queue

First is very simple:

$ CREATE UNLOGGED TABLE counters_queue (
    id serial PRIMARY KEY,
    counter_name TEXT,
    value_change INT4
);

Now, the same 10,000 inserts, in 10 parallel workers, take ~ 0.4s (with logged table, it took 12s!).

Of course, having the table unlogged means that if Pg would crash, all data in this table will be lost – so it's a tradeoff between what you can lose, and how fast you want to go.

Second part – batching inserts, is more complicated, and can be done only if you can somehow “buffer" inserts, and instead of inserting every row separately, insert them in groups.

For example. Changed my inserts so that each worker has to do only 100 inserts, but each insert inserts 10 elements to queue, like this:

$ insert into counters_queue (counter_name, value_change) values ('4', 1),('8', 1),('1', 1),('1', 1),('4', 1),('2', 1),('0', 1),('3', 1),('4', 1),('4', 1);

When I ran these 10 tasks on unlogged table, I got basically identical time (0.4s). But, when I made the counters_queue table logged, the same 10000 rows that previously took 12s, now finished in 1.3s!

So, to wrap it up, to base counters table approach, I:

  • added queue table (which can be optionally unlogged)
  • added function that removes rows from queue, and applies changes to counters
  • added cronjob that calls this function
  • modified queries that get counter value from table, to account for unapplied-yet values from queue table

Effect is that I have counter changes work faster, without locking (well, there is some locking, but very minimal compared to UPDATE's), much less bloat. But more complicated.

Whether it's worth the extra complexity – it's up for you to decide :).

  1. 22 comments

  2. # Alex
    Jun 14, 2016

    Is there any downside to using a sequence instead?

  3. Jun 14, 2016

    @Alex:
    well, you would need to have as many sequences as you have counters. And these are not transactional, so you can’t rollback seq increase.

  4. # ph
    Jun 14, 2016

    How about serial type? It can produce some locks too, doesn’t it? I mean – it gives incremental values and increments its counter – some kind of bottleneck

  5. Jun 14, 2016

    @ph:

    sorry, not sure what you mean. serial is basically a sequence. what about it?

  6. # ph
    Jun 15, 2016

    @DEPESZ:

    I mean that serial type gives numbers (id) to new records, that’s why this insertions can’t be true parallel – they are all waiting for new incremented values from sequence. Yeah, sequences are very fast. But may be it would be better to use just inserts without primary keys? )

  7. Jun 15, 2016

    PH: Sequences (and SERIAL) are non-blocking and do not wait for commit of other transactions. They’re non-transactional and simply discard values claimed by rolled back xacts. So they don’t ever wait for new incremented values from the sequence.

    The main issue with the approach discussed here is insufficient consideration of concurrency. Unless it’s OK to get duplicate counter values this approach will fail unless xacts are serialized by locking on the counter.

  8. # Terje Elde
    Jun 15, 2016

    Not only are serial and sequences comparable, but serial is basically shorthand for setting up a sequence, and setting values from it as default on an int column.

  9. # Terje Elde
    Jun 15, 2016

    Good writeup, but once you start considering something external (thinking of the cron job), you might as well keep the counter-logic partly externally as well. Have a tiny utility-daemon that LISTENs for counter-updates, and trigger increments with NOTIFY.
    Or move the congested counters to redis, and have a daemon sync PostgreSQL with current values.

  10. Jun 15, 2016

    @Craig Ringer:

    Can you elaborate on the serialization problem with my approach? I can’t see how/when the problem would appear.

  11. # misha
    Jun 15, 2016

    @craig
    @depesz

    avito.ru
    we’ve been using system like this for many years. works perfect!
    no any concurrency issues — all under read committed isolation.

  12. Jun 15, 2016

    @misha:
    “like this” – like the one I described in blogpost?

  13. # misha
    Jun 15, 2016

    @depezh

    @depezh
    yes, exactly the same approach.
    but we use only one table both for aggregated data and “queue”, so at least

    our query is simpler:

    select sum(c.val) as val from counters c where c.id = 'id'

    and our rotation function is

    CREATE OR REPLACE FUNCTION statistics.aggregate_user_items_cnt()
      RETURNS integer AS
    $BODY$
    begin
            set local work_mem = '4 GB';
     
            with src as (
                select
                    cnt.user_id,
                    cnt.category_id
                from
                    statistics.user_items_cnt cnt
                group by
                    cnt.user_id, cnt.category_id
                having
                    count(1) > 1
            ),
            del as (
                delete from statistics.user_items_cnt cnt
                using
                    src
                where
                    cnt.user_id = src.user_id and cnt.category_id = src.category_id
                returning cnt.*
            ),
            agg as (
                select
                    del.user_id,
                    del.category_id,
                    sum(del.cnt_total)::integer  as cnt_total,
                    sum(del.cnt_active)::integer as cnt_active
                from
                    del
                group by
                    del.user_id, del.category_id
            )
            insert into statistics.user_items_cnt(user_id, category_id, cnt_total, cnt_active, txtime)
                select agg.user_id, agg.category_id, agg.cnt_total, agg.cnt_active, now() from agg where agg.cnt_total > 0;
     
            return 0;
    end;
    $BODY$
      LANGUAGE plpgsql VOLATILE
      COST 100;
    ALTER FUNCTION statistics.aggregate_user_items_cnt()
      OWNER TO postgres;
  14. # misha
    Jun 15, 2016

    “>” is “>”

    stupid cross-windows copy-paste

  15. # misha
    Jun 15, 2016

    аааа

    & g t ;

    is “>”

  16. Jun 15, 2016

    @misha:

    fixed that 🙂

  17. # misha
    Jun 15, 2016

    🙂

  18. # Asaf
    Jun 15, 2016

    I don’t understand why I need a serial column.

    Won’t this table work the same ?
    CREATE UNLOGGED TABLE counters_queue (
    counter_name TEXT,
    value_change INT4
    );

    When aggregating the results, postgres is going to do a full table scan anyway.

  19. Jun 15, 2016

    @Asaf:

    because we don’t want to scan whole queue table at once, only at most “x” records.

    Reason for this is that the queue deleting/counter-updating function issues updates to counters, thus locking the values for some time. the shorter the time the better.

    With 1000 rows at a time, from queue, the time to process should be minimal, but we need something to limit with – so that’s why I use serial.

  20. # misha
    Jun 15, 2016

    @depesh

    and yes!

    there is no updates at all in my version

  21. # misha
    Jun 15, 2016

    statistics.aggregate_user_items_cnt()

    absolutely free from any locking

  22. Jun 15, 2016

    @misha:
    that’s obviously not true.

    try:

    begin;
    select statistics.aggrega...
    select * from pg_locks where pid = pg_backend_pid();
  23. # misha
    Jun 15, 2016

    hmm..

    > Reason for this is that the queue deleting/counter-updating function issues updates to counters, thus locking the values for some time.

    what are you talking about?

    but, i think this locking affecting nothing in average cases “plus/minus 1” and selecting counter values

Leave a comment