October 8th, 2008 by depesz | Tags: , , , , | 4 comments »
Did it help? If yes - maybe you can help me?

Let's assume you have very simple table with users:

# \d users
Table "public.users"
Column | Type | Modifiers
id | integer | not null default nextval('users_id_seq'::regclass)
username | text | not null
is_active | boolean | not null default true
"users_pkey" PRIMARY KEY, btree (id)
"users_username_key" UNIQUE, btree (username)

And you'd like to count all users, and know how many of them are active …

Well, you could do 2 queries:

select count(*) from users;
select count(*) from users where is_active = true;

Or, do it this way:

select is_active, count(*) from users group by is_active;

This would give you 2 rows of data, and you would have to manually add the values to get total count.

This is a bit inconvenient, but it is faster than previous approach, as it runs only 1 table scan.

Is there any way to get both values in 1 table scan? Yes.

We can use the fact that count() counts only not null values.

So, if we could somehow change inactive users into NULLs – we could do it.

There is simple way with CASE:

SELECT CASE WHEN is_active = true THEN true ELSE NULL END;

But it is just too long. Luckily, there is another function we could use: nullif().

Using this function I can write the query like this:

SELECT count(*) as all, count(nullif(is_active, false)) as active FROM users;

Simple and effective. Of course – we should check if it is faster or slower than “group by" approach.

So, I tested it.

Group by, ran 10 times gave me average time of 2510.9ms.

Double-count+nullif approach, ran on the same machine, also 10 times, gave me average time of 2440ms.

Difference is small, so of course there is no point in saying “this way is faster". But – it is not slower.

  1. 4 comments

  2. # tom
    Oct 9, 2008

    Typo: So, I tested id.
    ID addiction 😉

  3. Oct 9, 2008

    thanks. fixed.

  4. hm 🙂 this is nice trick with nullif and count… I was using sum(case is_active when true then 1 else 0 end)

    I wonder which approach is more effective…

  5. # mw
    Oct 12, 2008

    I wasn’t aware of nullif so far and use
    sum(is_active::int) for simple boolean flags.

    Performance: I don’t see a significant difference in any of the count/sum variations, when testing with a table cnt containing only two row (one with true, one with false) joined with a generate series over a million rows (something like select count(*), sum(flag::int) from cnt, generate_series(1,1000000);).
    The group by variants are a bit slower.

    In practice I’d say: stick with what you can read/write easiest.

Leave a comment