Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    DATA
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Which schema is using the most disk space?

I was faced with interesting problem. Which schema, in my DB, uses the most disk space? Theoretically it's trivial, we have set of helpful functions:

  • pg_column_size
  • pg_database_size
  • pg_indexes_size
  • pg_relation_size
  • pg_table_size
  • pg_tablespace_size
  • pg_total_relation_size

But in some cases it becomes more of a problem. For example – when you have thousands of tables …

Continue reading Which schema is using the most disk space?

Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax

On 1st of February, Peter Eisentraut committed patch:

Add CREATE RECURSIVE VIEW syntax
 
This is specified in the SQL standard.  The CREATE RECURSIVE VIEW
specification is transformed into a normal CREATE VIEW statement with a
WITH RECURSIVE clause.
 
reviewed by Abhijit Menon-Sen and Stephen Frost

Continue reading Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax

Getting top-N rows per group

Yesterday on irc someone asked:

Hi, how do I get top 5 values from a column group by another column??

From further discussion, I learned that:

total rows in table is 2 million. It'll have unique words of less than 1 million.. (approx count)

I didn't have time yesterday, but decided to write a solution, or two, to the problem.

Continue reading Getting top-N rows per group

How to get shortest connection between two cities

Yesterday, on #postgresql on irc some guy asked:

22:28 < rafasc> i am trying to use plpgsql to find the shortest path between two cities, each pair of cities has one or more edges, each edge has a different wheight.
22:28 < rafasc> Is there a easy way to compute the shortest path between two cities?

Well, I was not really in a mood to solve it, so I just told him to try with recursive queries, and went on my way.

But I thought about it. And decided to see if I can write the query.

Continue reading How to get shortest connection between two cities

r/trees ( recursive trees, what did you think about? )

I got asked on irc to show some examples how to use recursive CTE. Apparently my previous post wasn't good enough 🙂

I think that most of the users will use recursive cte to deal with trees I decided to show how to use it, even though it's not my favorite approach to dealing with trees in SQL.

Continue reading r/trees ( recursive trees, what did you think about? )

Find cheapest combination of rooms in hotels

Today, on Stack Overflow there was interesting question.

Generally, given table that looks like this:

room | people | price   | hotel
 1   |    1   |   200   |   A
 2   |    2   |   99    |   A
 3   |    3   |   95    |   A
 4   |    1   |   90    |   B
 5   |    6   |   300   |   B

Find cheapest combination of rooms that would accomodate given number of guests.

Continue reading Find cheapest combination of rooms in hotels

Calculating backlog of events to handle

Yesterday on my favorite IRC channel fooqux asked interesting question. I took some more questions, and here is problem description:

We have a system which, every 5 minutes, takes a number of tasks to be done. Tasks are uniform. Within 5 minutes we can handle at most 100 tasks. Given the history of number of tasks added every 5 minutes, calculate backlog at any given moment.

Did you understand the problem? Well – I didn't. So, let's see the data, and expected output.

Continue reading Calculating backlog of events to handle

Waiting for 8.4 – Common Table Expressions (WITH queries)

On 4th of September Tom Lane committed another great patch. This one is very large, and even after applying – it's has some rough edges. There will be need for additional patches to make the functionality fully robust, but the fact that it got committed means that it will be available in final 8.4.

What does it do?

Continue reading Waiting for 8.4 – Common Table Expressions (WITH queries)