September 13th, 2008 by depesz | Tags: , | 3 comments »
Did it help? If yes - maybe you can help me?

Let's assume we have a simple table:

CREATE TABLE objects (
    id          serial primary key,
    category    INT4        NOT NULL DEFAULT 0,
    object_type INT4        NOT NULL DEFAULT 0,
    entered_on  TIMESTAMPTZ NOT NULL DEFAULT now()
);

(This is simplification, but it contains all necessary columns).

What should I do to be able to quickly get 50 newest objects in given category/object_type (or in many categories/many object_types, or in all categories/object_types), optionally with limiting entered_on “older than …".

First, let's write some basic assumptions:

  • we have 15 categories, but the objects are not uniformly distributed
  • we have 4 object_types, and the distribution is also non-uniform
  • entered_on is nearly unique. By nearly I mean: nobody purposely adds 2 objects with the same entered_on timestamp, but it could happen that it happens
  • total number of objects is around 10 million (for tests it's enough)

Possible query types:

  1. give me 50 newest objects
  2. give me 50 newest objects that are of one of the types: (…) (1+ types)
  3. give me 50 newest objects that belong to one of categories: (…) (1+ categories)
  4. give me 50 newest objects that belong to one of categories: (…) (1+ categories) and are of one of the types: (…) (1+ types)

All queries will require objects to be returned in descending entered_on order, and can (but don't have to) be limited to objects older than given point in time.

Example queries:

    • select * from objects order by entered_on desc limit 50;
    • select * from objects where entered_on < '2008-09-12 12:34:24.109613+02'::timestamptz order by entered_on desc limit 50;
    • select * from objects where object_type in (1, 2) order by entered_on desc limit 50;
    • select * from objects where object_type in (1, 2) and entered_on < '2008-09-12 12:34:24.109613+02'::timestamptz order by entered_on desc limit 50;
    • select * from objects where category in (10, 12, 50) order by entered_on desc limit 50;
    • select * from objects where category in (10, 12, 50) and entered_on < '2008-09-12 12:34:24.109613+02'::timestamptz order by entered_on desc limit 50;
    • select * from objects where category in (10, 12, 50) and object_type in (1, 2) order by entered_on desc limit 50;
    • select * from objects where category in (10, 12, 50) and object_type in (1, 2) and entered_on < '2008-09-12 12:34:24.109613+02'::timestamptz order by entered_on desc limit 50;

Let's test what we can do to make it work. First, let's insert some data to test table:

INSERT INTO objects (category, object_type, entered_on)
SELECT
    categories.category,
    types.object_type,
    x.entered_on
FROM (
    SELECT
        floor(random() * 10000) as category_random,
        floor(random() * 100) as type_random,
        now() - ( '1 second'::INTERVAL * random() * 50000000 - '100000 seconds') as entered_on
    FROM
        generate_series(1,10000000) i
    ) x,
    ( VALUES (1, 0, 80), (2, 81, 88), (3, 89, 96), (4, 97, 99) ) as types (object_type, random_from, random_to),
    ( VALUES
        (1, 0, 9800),     (2, 9801, 9810),  (3, 9811, 9820),
        (4, 9821, 9830),  (5, 9831, 9840),  (6, 9841, 9850),
        (7, 9851, 9860),  (8, 9861, 9870),  (9, 9871, 9880),
        (10, 9881, 9895), (11, 9896, 9910), (12, 9911, 9924),
        (13, 9925, 9947), (14, 9948, 9960), (15, 9961, 9999)
    ) as categories (category, random_from, random_to)
WHERE
    x.category_random between categories.random_from AND categories.random_to
    AND
    x.type_random between types.random_from AND types.random_to
;

As you can see distribution is very uneven. Category 1 has around 98% of all objects. Type 1 has 80%.

After insert I got this data distribution:

 category | object_type | objects | numeric
----------+-------------+---------+---------
        1 |           1 | 7939847 |  79.398
        1 |           2 |  783665 |   7.837
        1 |           3 |  782740 |   7.827
        1 |           4 |  294253 |   2.943
       15 |           1 |   31585 |   0.316
       13 |           1 |   18684 |   0.187
       11 |           1 |   12287 |   0.123
       10 |           1 |   12109 |   0.121
       12 |           1 |   11377 |   0.114
       14 |           1 |   10562 |   0.106
...
(60 rows)

Table size is 460MB, and index (objects_pkey, primary key index on id column) is 171MB.

So, first, I'll try the approach that is possible since the inclusion of bitmap index scans.

The approach is – put index on every column, and let PostgreSQL do the magic.

CREATE INDEX i_objects_entered_on ON objects (entered_on);
CREATE INDEX i_objects_object_type ON objects (object_type);
CREATE INDEX i_objects_category ON objects (category);

(sizes respectively: 214, 171 and 171MB).

So, let's check the queries:

select * from objects order by entered_on desc limit 50;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..163.33 rows=50 width=20) (actual time=0.052..0.614 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32665404.68 rows=10000000 width=20) (actual time=0.049..0.477 rows=50 loops=1)
 Total runtime: 0.717 ms
(3 rows)

This is obviously very fast – simple backward index scan + limit.

select * from objects where entered_on < 2008-09-12 12:34:24.109613+02   order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..163.45 rows=50 width=20) (actual time=0.061..0.611 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32562560.36 rows=9960887 width=20) (actual time=0.058..0.479 rows=50 loops=1)
         Index Cond: (entered_on < '2008-09-12 12:34:24.109613+02'::timestamp with time zone)
 Total runtime: 0.700 ms
(4 rows)

Same here – we have another clause, but since the clause uses the same index – it is still very fast.

select * from objects where object_type in (1, 2) order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..198.01 rows=50 width=20) (actual time=0.017..0.293 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32690404.68 rows=8254906 width=20) (actual time=0.015..0.167 rows=50 loops=1)
         Filter: (object_type = ANY ('{1,2}'::integer[]))
 Total runtime: 0.375 ms
(4 rows)
select * from objects where object_type in (1, 2) and entered_on < 2008-09-12 12:34:24.109613+02   order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..198.16 rows=50 width=20) (actual time=0.030..0.309 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32587462.58 rows=8222619 width=20) (actual time=0.027..0.185 rows=50 loops=1)
         Index Cond: (entered_on < '2008-09-12 12:34:24.109613+02'::timestamp with time zone)
         Filter: (object_type = ANY ('{1,2}'::integer[]))
 Total runtime: 0.403 ms
(5 rows)

These 2 queries are very fast, but I suspect that the reason is that it tested for object_type = 1, which gives us 80% of chances. So, let's check with another set of types:

select * from objects where object_type in (3, 4) order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1502.83 rows=50 width=20) (actual time=0.059..1.564 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32690404.68 rows=1087627 width=20) (actual time=0.054..1.382 rows=50 loops=1)
         Filter: (object_type = ANY ('{3,4}'::integer[]))
 Total runtime: 1.714 ms
(4 rows)

Well. 1.7ms is not much, but it clearly is higher than previously. Basically – it will get slower for less-common types.

select * from objects where category in (10, 12, 50) order by entered_on desc limit 50;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=51270.18..51270.30 rows=50 width=20) (actual time=622.666..622.861 rows=50 loops=1)
   ->  Sort  (cost=51270.18..51343.46 rows=29312 width=20) (actual time=622.663..622.720 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=492.80..50296.45 rows=29312 width=20) (actual time=45.477..578.325 rows=28978 loops=1)
               Recheck Cond: (category = ANY ('{10,12,50}'::integer[]))
               ->  Bitmap Index Scan on i_objects_category  (cost=0.00..485.48 rows=29312 width=0) (actual time=40.333..40.333 rows=28978 loops=1)
                     Index Cond: (category = ANY ('{10,12,50}'::integer[]))
 Total runtime: 623.045 ms
(9 rows)
select * from objects where category in (10, 12, 50) and entered_on < 2008-09-12 12:34:24.109613+02   order by entered_on desc limit 50;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=51339.61..51339.73 rows=50 width=20) (actual time=643.485..643.674 rows=50 loops=1)
   ->  Sort  (cost=51339.61..51412.60 rows=29197 width=20) (actual time=643.482..643.542 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=492.78..50369.71 rows=29197 width=20) (actual time=46.785..596.981 rows=28854 loops=1)
               Recheck Cond: (category = ANY ('{10,12,50}'::integer[]))
               Filter: (entered_on < '2008-09-12 12:34:24.109613+02'::timestamp with time zone)
               ->  Bitmap Index Scan on i_objects_category  (cost=0.00..485.48 rows=29312 width=0) (actual time=41.799..41.799 rows=28978 loops=1)
                     Index Cond: (category = ANY ('{10,12,50}'::integer[]))
 Total runtime: 643.879 ms
(10 rows)
select * from objects where category in (10, 12, 50) and object_type in (1, 2) order by entered_on desc limit 50;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=51172.26..51172.39 rows=50 width=20) (actual time=629.563..629.751 rows=50 loops=1)
   ->  Sort  (cost=51172.26..51232.75 rows=24197 width=20) (actual time=629.559..629.618 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=491.53..50368.46 rows=24197 width=20) (actual time=48.193..589.456 rows=25743 loops=1)
               Recheck Cond: (category = ANY ('{10,12,50}'::integer[]))
               Filter: (object_type = ANY ('{1,2}'::integer[]))
               ->  Bitmap Index Scan on i_objects_category  (cost=0.00..485.48 rows=29312 width=0) (actual time=42.515..42.515 rows=28978 loops=1)
                     Index Cond: (category = ANY ('{10,12,50}'::integer[]))
 Total runtime: 629.933 ms
(10 rows)
select * from objects where category in (10, 12, 50) and object_type in (1, 2) and entered_on < 2008-09-12 12:34:24.109613+02   order by entered_on desc limit 50;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=51242.36..51242.49 rows=50 width=20) (actual time=637.028..637.216 rows=50 loops=1)
   ->  Sort  (cost=51242.36..51302.62 rows=24102 width=20) (actual time=637.024..637.102 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=491.50..50441.71 rows=24102 width=20) (actual time=47.897..593.414 rows=25632 loops=1)
               Recheck Cond: (category = ANY ('{10,12,50}'::integer[]))
               Filter: ((object_type = ANY ('{1,2}'::integer[])) AND (entered_on < '2008-09-12 12:34:24.109613+02'::timestamp with time zone))
               ->  Bitmap Index Scan on i_objects_category  (cost=0.00..485.48 rows=29312 width=0) (actual time=42.946..42.946 rows=28978 loops=1)
                     Index Cond: (category = ANY ('{10,12,50}'::integer[]))
 Total runtime: 637.351 ms

All of these, category-related, queries are bad. Reason is pretty simple – same problem we got from types (less common types mean longer time to process), but with categories the problem is even more important, as 98% of objects belong to category 1.

So, this solution is not really cool. Can we do anything better?

In case of queries like:

select * from table where field = x order by other_field;

you can get best results from multi-column index on (field, other_field).

In our test case we have following cases:

  • order by entered_on, no where
  • order by entered_on, where on type
  • order by entered_on, where on category
  • order by entered_on, where on category and type

This means 4 indexes:

  • (entered_on)
  • (object_type, entered_on)
  • (category, entered_on)
  • (category, object_type, entered_on)

First one, we already have. So let's add new ones (and remove unnecessary i_objects_category and i_objects_object_type).

CREATE INDEX i_objects_object_type_entered_on ON objects (object_type, entered_on);
CREATE INDEX i_objects_category_entered_on ON objects (category, entered_on);
CREATE INDEX i_objects_category_object_type_entered_on ON objects (category, object_type, entered_on);

Sizes: 257, 257 and 301MB.

First, let's check if the indexes will be used at all:

select * from objects where object_type = 3 order by entered_on desc limit 50;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..124.43 rows=50 width=20) (actual time=0.049..0.456 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_object_type_entered_on on objects  (cost=0.00..1938574.54 rows=779000 width=20) (actual time=0.043..0.269 rows=50 loops=1)
         Index Cond: (object_type = 3)
 Total runtime: 0.613 ms
(4 rows)
 
select * from objects where category = 3 order by entered_on desc limit 50;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..87.44 rows=50 width=20) (actual time=0.044..0.421 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_category_entered_on on objects  (cost=0.00..27399.98 rows=15667 width=20) (actual time=0.040..0.235 rows=50 loops=1)
         Index Cond: (category = 3)
 Total runtime: 0.559 ms
(4 rows)
 
select * from objects where category = 3 and object_type = 3 order by entered_on desc limit 50;
                                                                                QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..97.82 rows=50 width=20) (actual time=0.045..0.436 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_category_object_type_entered_on on objects  (cost=0.00..2386.76 rows=1220 width=20) (actual time=0.041..0.247 rows=50 loops=1)
         Index Cond: ((category = 3) AND (object_type = 3))
 Total runtime: 0.580 ms
(4 rows)

Yepp, everything looks fine.

But what about cases when we want to use “IN (?, ?)" queries?

select * from objects where object_type IN (3, 4) order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1531.00 rows=50 width=20) (actual time=0.093..4.282 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32690397.76 rows=1067617 width=20) (actual time=0.090..4.149 rows=50 loops=1)
         Filter: (object_type = ANY ('{3,4}'::integer[]))
 Total runtime: 4.385 ms
(4 rows)
 
select * from objects where category IN (3, 4) order by entered_on desc limit 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=49074.46..49074.58 rows=50 width=20) (actual time=258.787..258.991 rows=50 loops=1)
   ->  Sort  (cost=49074.46..49141.08 rows=26649 width=20) (actual time=258.782..258.851 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=560.35..48189.19 rows=26649 width=20) (actual time=11.109..227.794 rows=20273 loops=1)
               Recheck Cond: (category = ANY ('{3,4}'::integer[]))
               ->  Bitmap Index Scan on i_objects_category_entered_on  (cost=0.00..553.69 rows=26649 width=0) (actual time=6.573..6.573 rows=20273 loops=1)
                     Index Cond: (category = ANY ('{3,4}'::integer[]))
 Total runtime: 259.142 ms
(9 rows)
 
select * from objects where category IN (3, 4) and object_type IN (3, 4) order by entered_on desc limit 50;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9517.78..9517.90 rows=50 width=20) (actual time=26.435..26.624 rows=50 loops=1)
   ->  Sort  (cost=9517.78..9524.89 rows=2845 width=20) (actual time=26.431..26.484 rows=50 loops=1)
         Sort Key: entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Bitmap Heap Scan on objects  (cost=79.69..9423.27 rows=2845 width=20) (actual time=1.201..22.879 rows=2256 loops=1)
               Recheck Cond: ((category = ANY ('{3,4}'::integer[])) AND (object_type = ANY ('{3,4}'::integer[])))
               ->  Bitmap Index Scan on i_objects_category_object_type_entered_on  (cost=0.00..78.98 rows=2845 width=0) (actual time=0.706..0.706 rows=2256 loops=1)
                     Index Cond: ((category = ANY ('{3,4}'::integer[])) AND (object_type = ANY ('{3,4}'::integer[])))
 Total runtime: 26.740 ms
(9 rows)

Unfortunately – it doesn't work really well.

So, is there no way to solve this problem?

Luckily, there is. But it will require a not so small rewrite of the queries.

I'll start with the category/entered_on example, as it is the one that will look best.

Original query is:

select * from objects where category IN (3, 4) order by entered_on desc limit 50;

The problem that we have is that PostgreSQL is not able to use fully power of i_objects_category_entered_on index. And the reason is: because we use “IN" operator, and not “=" operator.

So, let's change the operator. What about those 2 values? Simply, let's do it this way:

SELECT * FROM
(
    select * from (select * from objects where category = 3 order by entered_on desc limit 50) as x_1
    UNION ALL
    select * from (select * from objects where category = 4 order by entered_on desc limit 50) as x_2
) as x_out
ORDER BY entered_on desc LIMIT 50;

How does it look on explain analyze?

                                                                                       QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=181.87..181.99 rows=50 width=20) (actual time=1.952..2.243 rows=50 loops=1)
   ->  Sort  (cost=181.87..182.12 rows=100 width=20) (actual time=1.947..2.051 rows=50 loops=1)
         Sort Key: public.objects.entered_on
         Sort Method:  quicksort  Memory: 20kB
         ->  Result  (cost=0.00..178.54 rows=100 width=20) (actual time=0.059..1.595 rows=100 loops=1)
               ->  Append  (cost=0.00..178.54 rows=100 width=20) (actual time=0.053..1.201 rows=100 loops=1)
                     ->  Limit  (cost=0.00..87.44 rows=50 width=20) (actual time=0.049..0.429 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_entered_on on objects  (cost=0.00..27399.98 rows=15667 width=20) (actual time=0.044..0.242 rows=50 loops=1)
                                 Index Cond: (category = 3)
                     ->  Limit  (cost=0.00..90.10 rows=50 width=20) (actual time=0.031..0.407 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_entered_on on objects  (cost=0.00..19821.69 rows=11000 width=20) (actual time=0.027..0.219 rows=50 loops=1)
                                 Index Cond: (category = 4)
 Total runtime: 2.432 ms
(13 rows)

Not bad. Not bad at all.

Type-based query is basically the same. But what about type and category query?

select * from objects where category IN (3, 4) and object_type IN (3, 4) order by entered_on desc limit 50

Well, in here, we will have to use 4 subqueries:

select * FROM
(
    select * from ( select * from objects where (category, object_type) = (3, 3) order by entered_on desc limit 50 ) as x_1
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (3, 4) order by entered_on desc limit 50 ) as x_2
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (4, 4) order by entered_on desc limit 50 ) as x_3
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (4, 3) order by entered_on desc limit 50 ) as x_4
) AS x_o
ORDER BY entered_on desc limit 50

Its explain looks:

                                                                                            QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=402.54..402.66 rows=50 width=20) (actual time=4.016..4.285 rows=50 loops=1)
   ->  Sort  (cost=402.54..403.04 rows=200 width=20) (actual time=4.012..4.102 rows=50 loops=1)
         Sort Key: public.objects.entered_on
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Result  (cost=0.00..395.89 rows=200 width=20) (actual time=0.062..3.354 rows=200 loops=1)
               ->  Append  (cost=0.00..395.89 rows=200 width=20) (actual time=0.056..2.553 rows=200 loops=1)
                     ->  Limit  (cost=0.00..97.82 rows=50 width=20) (actual time=0.052..0.451 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_object_type_entered_on on objects  (cost=0.00..2386.76 rows=1220 width=20) (actual time=0.048..0.274 rows=50 loops=1)
                                 Index Cond: ((category = 3) AND (object_type = 3))
                     ->  Limit  (cost=0.00..98.56 rows=50 width=20) (actual time=0.044..0.430 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_object_type_entered_on on objects  (cost=0.00..965.92 rows=490 width=20) (actual time=0.041..0.252 rows=50 loops=1)
                                 Index Cond: ((category = 3) AND (object_type = 4))
                     ->  Limit  (cost=0.00..99.24 rows=50 width=20) (actual time=0.027..0.472 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_object_type_entered_on on objects  (cost=0.00..682.74 rows=344 width=20) (actual time=0.023..0.276 rows=50 loops=1)
                                 Index Cond: ((category = 4) AND (object_type = 4))
                     ->  Limit  (cost=0.00..98.28 rows=50 width=20) (actual time=0.018..0.471 rows=50 loops=1)
                           ->  Index Scan Backward using i_objects_category_object_type_entered_on on objects  (cost=0.00..1684.44 rows=857 width=20) (actual time=0.014..0.281 rows=50 loops=1)
                                 Index Cond: ((category = 4) AND (object_type = 3))
 Total runtime: 4.537 ms
(19 rows)

It's long, but – it did it's work in 4.5ms, while original query required 26ms!

Is it all? Well, kind of.

The solution with multiple queries and UNION ALL will work fine. But in some cases it might be better to use another one.

For example – let's assume that some guy chosen 14 (out of 15 categories) and 3 (out of 4 categories). It would mean that internally we will need to run 42 separate subqueries.

In such cases it might be better if we simply sorted the data for all categories and all types, and simply ruled out objects which belong to the categories (or types) we don't want.

Something like this:

SELECT * FROM objects WHERE category <> 3 AND object_type <> 3 ORDER BY entered_on DESC LIMIT 50;
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..177.67 rows=50 width=20) (actual time=0.043..0.478 rows=50 loops=1)
   ->  Index Scan Backward using i_objects_entered_on on objects  (cost=0.00..32715397.76 rows=9206554 width=20) (actual time=0.037..0.290 rows=50 loops=1)
         Filter: ((category <> 3) AND (object_type <> 3))
 Total runtime: 0.648 ms
(4 rows)

MULTI+UNION query for this data looks like:

select * FROM
(
    select * from ( select * from objects where (category, object_type) = (1, 1) order by entered_on desc limit 50 ) as x_1
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (2, 2) order by entered_on desc limit 50 ) as x_2
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (4, 4) order by entered_on desc limit 50 ) as x_3
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (5, 1) order by entered_on desc limit 50 ) as x_4
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (6, 2) order by entered_on desc limit 50 ) as x_5
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (8, 4) order by entered_on desc limit 50 ) as x_6
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (9, 1) order by entered_on desc limit 50 ) as x_7
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (10, 2) order by entered_on desc limit 50 ) as x_8
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (12, 4) order by entered_on desc limit 50 ) as x_9
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (13, 1) order by entered_on desc limit 50 ) as x_10
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (14, 2) order by entered_on desc limit 50 ) as x_11
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (1, 4) order by entered_on desc limit 50 ) as x_12
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (2, 1) order by entered_on desc limit 50 ) as x_13
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (5, 4) order by entered_on desc limit 50 ) as x_14
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (6, 1) order by entered_on desc limit 50 ) as x_15
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (7, 2) order by entered_on desc limit 50 ) as x_16
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (9, 4) order by entered_on desc limit 50 ) as x_17
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (10, 1) order by entered_on desc limit 50 ) as x_18
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (11, 2) order by entered_on desc limit 50 ) as x_19
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (13, 4) order by entered_on desc limit 50 ) as x_20
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (14, 1) order by entered_on desc limit 50 ) as x_21
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (15, 2) order by entered_on desc limit 50 ) as x_22
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (2, 4) order by entered_on desc limit 50 ) as x_23
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (4, 2) order by entered_on desc limit 50 ) as x_24
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (6, 4) order by entered_on desc limit 50 ) as x_25
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (7, 1) order by entered_on desc limit 50 ) as x_26
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (8, 2) order by entered_on desc limit 50 ) as x_27
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (10, 4) order by entered_on desc limit 50 ) as x_28
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (11, 1) order by entered_on desc limit 50 ) as x_29
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (12, 2) order by entered_on desc limit 50 ) as x_30
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (14, 4) order by entered_on desc limit 50 ) as x_31
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (15, 1) order by entered_on desc limit 50 ) as x_32
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (1, 2) order by entered_on desc limit 50 ) as x_33
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (4, 1) order by entered_on desc limit 50 ) as x_34
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (5, 2) order by entered_on desc limit 50 ) as x_35
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (7, 4) order by entered_on desc limit 50 ) as x_36
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (8, 1) order by entered_on desc limit 50 ) as x_37
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (9, 2) order by entered_on desc limit 50 ) as x_38
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (11, 4) order by entered_on desc limit 50 ) as x_39
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (12, 1) order by entered_on desc limit 50 ) as x_40
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (13, 2) order by entered_on desc limit 50 ) as x_41
    UNION ALL
    select * from ( select * from objects where (category, object_type) = (15, 4) order by entered_on desc limit 50 ) as x_42
) AS x_o
ORDER BY entered_on desc limit 50

Execution time? Around 44ms.

So, how do we know which solution would be best for given arguments? We have to keep statistics which will tell us how many % of rows match our category/object_type criteria. This is complicated, but doable.

And when we have the statistics – we have to do some tests, to find the “sweet spot" when “MULTI+UNION" becomes faster/slower than “ALL-except-some" solution.

If you're still reading it – congratulations – you just went through around 33kB of text. In case you're wondering why I even wrote it – it relates to a problem we had at work, and it shows in numbers the ideas/problems we encountered.

  1. 3 comments

  2. # ivan
    Sep 15, 2008

    This is a great piece of research. Thanks!

  3. # Ants Aasma
    Sep 17, 2008

    The index statistics have the percentages for each criteria. This actually seems a common enough optimization to be worthwhile in the core optimizer. If I remember correctly Oracle has this optimization built in.

  4. # James
    Sep 18, 2008

    Hi, I found your blog on this new directory of WordPress Blogs at blackhatbootcamp.com/listofwordpressblogs. I dont know how your blog came up, must have been a typo, i duno. Anyways, I just clicked it and here I am. Your blog looks good. Have a nice day. James.

Leave a comment