How to limit rows to at most N per category – fix

A bit ago I wrote a blog post that was supposed to show how to keep number of rows in table to N per some category.

Unfortunately, I overlooked a problem related to concurrency.

As Mikhail wrote, we will get problems if we'd do operations from multiple connections, at once-ish.

Let's see it:

CREATE TABLE users (
    id       text NOT NULL,
    PRIMARY KEY (id)
);
INSERT INTO users (id) VALUES ('depesz');
CREATE TABLE addresses (
    id      int8 generated BY DEFAULT AS IDENTITY,
    user_id text NOT NULL REFERENCES users (id),
    PRIMARY KEY (id)
);
CREATE INDEX addresses_user_id ON addresses (user_id);

Now, let's add limiting triggers, with limit being 3:

CREATE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$
DECLARE
    v_count int4;
BEGIN
    SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;
    IF v_count >= 3 THEN
        raise exception
            'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1
            USING ERRCODE = 'check_violation';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_check_addresses_per_user
    BEFORE INSERT OR UPDATE ON addresses
    FOR EACH ROW EXECUTE FUNCTION trg_check_addresses_per_user();

Now, I will need two separate psql sessions:

psql #1 psql #2
begin; begin;
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');  
  insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');
commit; commit;

Nothing failed, and checking state shows:

$ SELECT * FROM addresses;
 id | user_id 
----+---------
  4 | depesz
  5 | depesz
  6 | depesz
  7 | depesz
  8 | depesz
  9 | depesz
(6 ROWS)

The problem stems from the fact that while checking if there isn't too many rows, trigger can't see rows inserted by another connection in uncommitted transaction.

So, what can we do? We need some kind of locking.

First, simplistic approach, for our case, is to lock the user that we're inserting data for:

  1. CREATE OR REPLACE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$
  2. DECLARE
  3.     v_count int4;
  4. BEGIN
  5.     PERFORM 1 FROM users WHERE id = NEW.user_id FOR UPDATE;
  6.     SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;
  7.     IF v_count >= 3 THEN
  8.         raise exception
  9.             'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1
  10.             USING ERRCODE = 'check_violation';
  11.     END IF;
  12.     RETURN NEW;
  13. END;
  14. $$ LANGUAGE plpgsql;

Please note the line #5 – this is the locking.

So, let's redo the test. Of course I first had to truncate addresses, but let's see what happens then:

psql #1 psql #2
begin; begin;
insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');  
  insert into addresses (user_id) values (‘depesz'), (‘depesz'), (‘depesz');

Now, insert in 2nd psql session doesn't finish – it hangs waiting for lock on users.

When I issued commit; in psql #1, psql #2 showed:

depesz=*# INSERT INTO addresses (user_id) VALUES ('depesz'), ('depesz'), ('depesz');
ERROR:  USER depesz would have 4 addresses, but ONLY 3 allowed.
CONTEXT:  PL/pgSQL FUNCTION trg_check_addresses_per_user() line 8 at RAISE

This is nice. Solves the problem. But – it will also block all other updates to row in users.

Luckily PostgreSQL has Advisory Locks. I wrote about them earlier, so I'll skip the theory and move to actual code.

  1. CREATE OR REPLACE FUNCTION trg_check_addresses_per_user() RETURNS TRIGGER AS $$
  2. DECLARE
  3.     v_count int4;
  4. BEGIN
  5.     PERFORM pg_advisory_xact_lock( 123, hashtext( NEW.user_id ) );
  6.     SELECT COUNT(*) INTO v_count FROM addresses WHERE user_id = NEW.user_id;
  7.     IF v_count >= 3 THEN
  8.         raise exception
  9.             'User % would have % addresses, but only 3 allowed.', NEW.user_id, v_count+1
  10.             USING ERRCODE = 'check_violation';
  11.     END IF;
  12.     RETURN NEW;
  13. END;
  14. $$ LANGUAGE plpgsql;

Functionally – it behaves the same way – INSERT from 2nd psql hangs until psql #1 will not finish transaction.

But the difference is that now I still can update users.

Advisory lock will get automatically removed when transaction ended, so you don't have to worry about it.

In case you wonder – value 123 was picked randomly. I prefer to use two value advisory locks, where first argument is “namespace", and second is what I want to lock. Since I am using text identifiers had to convert it to int somehow – hence the hashtext() function.

Also – the version that kept counts in users table was safe with regards to this issue, as update to users is locking updated row.

Hope it helps.

7 thoughts on “How to limit rows to at most N per category – fix”

  1. In a solution with ‘FOR UPDATE’ you could just use SELECT FOR UPDATE SKIP LOCKED, so the second transaction won’t block, then check if we got the value, if not null then check count and proceed your transaction. Am I missing something?

  2. @Aman:
    Sorry, but I don’t understand what you’re suggesting. If you don’t lock the row (which is effectively what “for update” does), then you can’t be sure the count it correct.

  3. @depesz
    I wanted to say that, when we SELECT FOR UPDATE SKIP LOCKED and if there is now row found, then we just raise an error quitting the transaction. But now I see, it will block the whole row and we want use a lock only for that exactly this count check operation, therefore your solution is better.
    Thank you for a quick reply and such a comprehensible post.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.