Getting top-N rows per group

Yesterday on irc someone asked:

Hi, how do I get top 5 values from a column group by another column??

From further discussion, I learned that:

total rows in table is 2 million. It'll have unique words of less than 1 million.. (approx count)

I didn't have time yesterday, but decided to write a solution, or two, to the problem.

Continue reading Getting top-N rows per group

Filling the gaps with window functions

Couple of days ago I had a problem that I couldn't solve after ~ 2 hours, and decided to ask on IRC. Almost immediately after asking, I figured out the solution, but David asked me to write about the solution, even though it's now (for me) completely obvious.

The problem was like this:

I had two tables, with very simple structure: event_when timestamptz, event_count int4, and wanted to show it as a single recordset with columns: event_when, event_count_a, event_count_b, but the problem was that event_when usually didn't match. Here is an example:

Continue reading Filling the gaps with window functions

r/trees ( recursive trees, what did you think about? )

I got asked on irc to show some examples how to use recursive CTE. Apparently my previous post wasn't good enough 🙂

I think that most of the users will use recursive cte to deal with trees I decided to show how to use it, even though it's not my favorite approach to dealing with trees in SQL.

Continue reading r/trees ( recursive trees, what did you think about? )

Find cheapest combination of rooms in hotels

Today, on Stack Overflow there was interesting question.

Generally, given table that looks like this:

room | people | price   | hotel
 1   |    1   |   200   |   A
 2   |    2   |   99    |   A
 3   |    3   |   95    |   A
 4   |    1   |   90    |   B
 5   |    6   |   300   |   B

Find cheapest combination of rooms that would accomodate given number of guests.

Continue reading Find cheapest combination of rooms in hotels

Waiting for 9.1 – Writable CTE

On 25th of February, Tom Lane committed patch:

Support data-modifying commands (INSERT/UPDATE/DELETE) IN WITH.   
 
This patch implements data-modifying WITH queries according TO the           
semantics that the updates ALL happen WITH the same command counter VALUE,
AND IN an unspecified ORDER.  Therefore one WITH clause can't see the 
effects of another, nor can the outer query see the effects other than
through the RETURNING values.  And attempts to do conflicting updates will
have unpredictable results.  We'll need TO document ALL that.           
 
This commit just fixes the code; documentation updates are waiting ON
author.                                                                 
 
Marko Tiikkaja AND Hitoshi Harada

Continue reading Waiting for 9.1 – Writable CTE

How to group messages into chats?

My jabber server had the feature, that it logs all messages that got sent through it.

This is pretty cool, and useful. And now, i got asked to use it to create list of conversations.

What exactly is this? Whenever I send (or receive) something there is record in database with information about which local user, communication type (send/recv), correspondent, when it happened, and what is the body of message.

And based on this, we want to list messages into chats. How?

Continue reading How to group messages into chats?

Profiling stored procedures/functions

One database that I am monitoring uses a lot of stored procedures. Some of them are fast, some of them are not so fast. I thought – is there a sensible way to diagnose which part of stored procedure take the most time?

I mean – I could just put the logic into application, and then every query would have it's own timing in Pg logs, but this is not practical. And I also believe that using stored procedures/functions is way better than using plain SQL due to a number of reasons.

So, I'm back to question – how to check which part of function takes most of the time?

Continue reading Profiling stored procedures/functions

Calculating backlog of events to handle

Yesterday on my favorite IRC channel fooqux asked interesting question. I took some more questions, and here is problem description:

We have a system which, every 5 minutes, takes a number of tasks to be done. Tasks are uniform. Within 5 minutes we can handle at most 100 tasks. Given the history of number of tasks added every 5 minutes, calculate backlog at any given moment.

Did you understand the problem? Well – I didn't. So, let's see the data, and expected output.

Continue reading Calculating backlog of events to handle

Waiting for 8.4 – Common Table Expressions (WITH queries)

On 4th of September Tom Lane committed another great patch. This one is very large, and even after applying – it's has some rough edges. There will be need for additional patches to make the functionality fully robust, but the fact that it got committed means that it will be available in final 8.4.

What does it do?

Continue reading Waiting for 8.4 – Common Table Expressions (WITH queries)