skip scan – select * from depesz;

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    data
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

=$
|