April 27th, 2011 by depesz | Tags: , , , , , | 11 comments »
Did it help? If yes - maybe you can help me?

Today, on Stack Overflow there was interesting question.

Generally, given table that looks like this:

room | people | price   | hotel
 1   |    1   |   200   |   A
 2   |    2   |   99    |   A
 3   |    3   |   95    |   A
 4   |    1   |   90    |   B
 5   |    6   |   300   |   B

Find cheapest combination of rooms that would accomodate given number of guests.

For example – For 1 guest it should be:

hotel | price
  A   |  200
  B   |  90

for 2 guests:

hotel | price
  A   |  99

and for 6 guests:

hotel | price
  A   |  394
  B   |  300

So far so good.

Of course, the problem in general is not possible to solve, because it is NP-Complete.

But, what about actually doing it for small sets of data? Well, I know that for
1000 hotels, and 100000 rooms it will take ages, but what about these 2 hotels
and 5 rooms?

Decided to try.

Since for some number of people we will need only one room, and for some many rooms, we need to start (and this is the complicated part) with generating list of all sets of rooms.

For example – having our table of rooms above, we should get following sets of rooms:

  • hotel A: 1, 2, 3, 1 + 2, 1 + 3, 2 + 3, 1 + 2 + 3
  • hotel B: 4, 5, 4 + 5

This naturally (well, for me, you might have different ideas) screams for using recursive CTE.

Let's start – every set starts with single room. So, I write:

WITH
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        room_price r
)
SELECT * FROM room_sets;
 hotel | rooms | price | people
-------+-------+-------+--------
 A     | {1}   |   200 |      1
 A     | {2}   |    99 |      2
 A     | {3}   |    95 |      3
 B     | {4}   |    90 |      1
 B     | {5}   |   300 |      6
(5 rows)

So far nothing magical, I just changed datatype from int4 to int4[] – i.e. array.

Array will be useful, so that we can have all room numbers in a set in single column.

Now, we need to recursively make combined room sets, with additional rooms. New query:

WITH RECURSIVE
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        room_price r
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        room_sets rs
        join room_price n using (hotel)
    WHERE
        NOT ( n.room = ANY ( rs.rooms ) )
)
SELECT * FROM room_sets;
 hotel |  rooms  | price | people
-------+---------+-------+--------
 A     | {1}     |   200 |      1
 A     | {2}     |    99 |      2
 A     | {3}     |    95 |      3
 B     | {4}     |    90 |      1
 B     | {5}     |   300 |      6
 A     | {1,3}   |   295 |      4
 A     | {1,2}   |   299 |      3
 A     | {2,3}   |   194 |      5
 A     | {2,1}   |   299 |      3
 A     | {3,2}   |   194 |      5
 A     | {3,1}   |   295 |      4
 B     | {4,5}   |   390 |      7
 B     | {5,4}   |   390 |      7
 A     | {1,3,2} |   394 |      6
 A     | {1,2,3} |   394 |      6
 A     | {2,3,1} |   394 |      6
 A     | {2,1,3} |   394 |      6
 A     | {3,2,1} |   394 |      6
 A     | {3,1,2} |   394 |      6
(19 rows)

Whoa. That's nice. We have all possible combinations of rooms. But it is
actually pretty problematic. For example – last 6 rows all describe the same
set of rows, just ordering of rows in set is different. But we don't care about
order in set. So, what can we do?

It's actually pretty simple. As you see we have condition:

NOT ( n.room = ANY ( rs.rooms ) )

Which makes sure that given room can be in set only once.

But we can change this, by forcing that we can add only rooms that are “after" (numerically) latest room in set. Which gives us:

WITH RECURSIVE
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        room_price r
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        room_sets rs
        join room_price n using (hotel)
    WHERE
        n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ]
)
SELECT * FROM room_sets;
 hotel |  rooms  | price | people
-------+---------+-------+--------
 A     | {1}     |   200 |      1
 A     | {2}     |    99 |      2
 A     | {3}     |    95 |      3
 B     | {4}     |    90 |      1
 B     | {5}     |   300 |      6
 A     | {1,3}   |   295 |      4
 A     | {1,2}   |   299 |      3
 A     | {2,3}   |   194 |      5
 B     | {4,5}   |   390 |      7
 A     | {1,2,3} |   394 |      6
(10 rows)

Which is pretty cool. Each set is only once, we have summarized price, and people count.

One thing though. We know that we are looking for rooms for n people. Based on this we probably should limit what the query returns, and not generate sets that are guaranteed to be too big (we still need to generate sets smaller, because larger sets are based on smaller).

Since we will be using this number (n-people) many times, let's put it in single place, and give it a name:

WITH RECURSIVE
setup AS (
    SELECT 3::INT4 as people
),
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        setup s,
        room_price r
    WHERE
        r.people <= s.people
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        setup s,
        room_sets rs
        join room_price n using (hotel)
    WHERE
        n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ]
        AND rs.people + n.people <= s.people
)
SELECT * FROM room_sets;
 hotel | rooms | price | people
-------+-------+-------+--------
 A     | {1}   |   200 |      1
 A     | {2}   |    99 |      2
 A     | {3}   |    95 |      3
 B     | {4}   |    90 |      1
 A     | {1,2} |   299 |      3
(5 rows)

Now, we no longer have sets of rooms that are larger than requested number of people.

With this we can finally filter it to room sets that accomodate requested number of people:

WITH RECURSIVE
setup AS (
    SELECT 3::INT4 as people
),
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        setup s,
        room_price r
    WHERE
        r.people <= s.people
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        setup s,
        room_sets rs
        join room_price n using (hotel)
    WHERE
        n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ]
        AND rs.people + n.people <= s.people
)
SELECT
    rs.*
FROM
    setup s,
    room_sets rs
WHERE
    s.people = rs.people
;
 hotel | rooms | price | people
-------+-------+-------+--------
 A     | {3}   |    95 |      3
 A     | {1,2} |   299 |      3
(2 rows)

Which is all great, but we should show only 1 set per hotel. To do this, we can use DISTINCT ON clause:

WITH RECURSIVE
setup AS (
    SELECT 3::INT4 as people
),
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        setup s,
        room_price r
    WHERE
        r.people <= s.people
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        setup s,
        room_sets rs
        join room_price n using (hotel)
    WHERE
        n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ]
        AND rs.people + n.people <= s.people
)
SELECT
    DISTINCT ON (rs.hotel)
    rs.*
FROM
    setup s,
    room_sets rs
WHERE
    s.people = rs.people
ORDER BY
    rs.hotel, rs.price
;
 hotel | rooms | price | people
-------+-------+-------+--------
 A     | {3}   |    95 |      3
(1 row)

Which works great, and returns expected results for other cases too.

One final problem which we might have is that when search returns data from different hotels, final result set is sorted by hotel. And we might want to have it sorted by price.

For example, for 1 person the query returns:

 hotel | rooms | price | people
-------+-------+-------+--------
 A     | {1}   |   200 |      1
 B     | {4}   |    90 |      1
(2 rows)

Solution is simple – we can't change ORDER BY clause, as it's needed for DISTINCT ON, but we can reorder the data:

WITH RECURSIVE
setup AS (
    SELECT 1::INT4 as people
),
room_sets AS (
    SELECT
        r.hotel,
        ARRAY[ r.room ] as rooms,
        r.price,
        r.people
    FROM
        setup s,
        room_price r
    WHERE
        r.people <= s.people
    UNION ALL
    SELECT
        rs.hotel,
        rs.rooms || n.room,
        rs.price + n.price as price,
        rs.people + n.people as people
    FROM
        setup s,
        room_sets rs
        join room_price n using (hotel)
    WHERE
        n.room > rs.rooms[ array_upper( rs.rooms, 1 ) ]
        AND rs.people + n.people <= s.people
),
results AS (
    SELECT
        DISTINCT ON (rs.hotel)
        rs.*
    FROM
        setup s,
        room_sets rs
    WHERE
        s.people = rs.people
    ORDER BY
        rs.hotel, rs.price
)
SELECT * FROM results ORDER BY price
;
 hotel | rooms | price | people
-------+-------+-------+--------
 B     | {4}   |    90 |      1
 A     | {1}   |   200 |      1
(2 rows)

Final note – this will not be fast for large data sets. But it might be fast enough for the dataset that you have.

  1. 11 comments

  2. Apr 27, 2011

    Minor nit: NP-Complete doesn’t mean unsolvable. It just means that the computational complexity grows at a rate that’s intractable beyond a certain number.

  3. Apr 27, 2011

    @David:
    well, sure. It was a kind of mental shortcut.

  4. # seb
    Apr 27, 2011

    typo – at line 270 you &gt; instead of >
    i think. thanks for your posts, always very cool.

  5. Apr 27, 2011

    @Seb: thanks, fixed.

  6. Apr 28, 2011

    Boskie 😀

  7. What’s the advantage to using a CTE for a constant vs JOINing to a VALUES?

    For instance, we could skip setup, and rewrite results as (https://gist.github.com/946329 in case formatting is wrong):

    SELECT
    DISTINCT ON (rs.hotel)
    rs.*
    FROM
    room_sets rs, VALUES(3) AS setup(people)
    WHERE
    s.people = rs.people
    ORDER BY
    rs.hotel, rs.price

  8. Apr 28, 2011

    @François:

    1. we already have to use CTE
    2. I am using the setup cte 3 times. Thanks to this, changing the query to return combinations for another number of people means that I have to change only 1 place.
    If i’d use values() I would have to use the values in every place, thus – every change would require changes in 3 places, and it would be additional risk of making the change badly (different values in different places).

  9. May 3, 2011

    Nice, but generating all sets is probably least efficient approach.

    BTW some boundaries should be noted (and similar problems appear, depending on boundaries):
    1. Do all guests have to stay in the same hotel?
    2. If the price is the same, should we choose solution with lowest average people in room?
    3. Is it allowed to left unused places in rooms (4 people; hotel A – 2 for 100 and 3 for 150; hotel B – two 2 for 230 each)?

  10. May 3, 2011

    @Rozie:
    can you show any other solution?
    As for your questions:
    1. yes – that’s the point of room sets *per hotel*
    2. if the price is the same, I don’t care.
    3. no – that was stated pretty clearly in the original question on stack overflow.

  11. # vol7ron
    Sep 5, 2011

    I was having trouble understanding where the 394 came from (Rooms A1+A2+A3). I think it would be worth noting that the problem seems to have a condition where a room may only be used once. Or, the table represents the availability of one room only. So for 6 people it would not be $190 (Room 3A+3A).

  12. Sep 5, 2011

    @Vol7tron:
    Text states:

    “Which makes sure that given room can be in set only once.”

Leave a comment