Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    DATA
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

New changes on explain.depesz.com – fixed calculations of exclusive times

When you see plan on explain.depesz.com, one of the columns visible is exclusive.

In principle it's simple – take time of current node, subtract times of all sub nodes, and you get how much time was spent in this node alone.

Reality is not that simple. Two things especially make the calculations complicated. CTEs, and InitPlans.

Now, with the just pushed change to Pg::Explain (plan parsing library), and site itself, some of the bad calculations are gone.

Let's see some examples:

Continue reading New changes on explain.depesz.com – fixed calculations of exclusive times

Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.

On 16th of February 2019, Tom Lane committed patch:

Allow user control of CTE materialization, and change the default behavior.
 
Historically we've always materialized the full output of a CTE query,
treating WITH as an optimization fence (so that, for example, restrictions
from the outer query cannot be pushed into it).  This is appropriate when
the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE
query is non-recursive and side-effect-free, there's no hazard of changing
the query results by pushing restrictions down.
 
Another argument for materialization is that it can avoid duplicate
computation of an expensive WITH query --- but that only applies if
the WITH query is called more than once in the outer query.  Even then
it could still be a net loss, if each call has restrictions that
would allow just a small part of the WITH query to be computed.
 
Hence, let's change the behavior for WITH queries that are non-recursive
and side-effect-free.  By default, we will inline them into the outer
query (removing the optimization fence) if they are called just once.
If they are called more than once, we will keep the old behavior by
default, but the user can override this and force inlining by specifying
NOT MATERIALIZED.  Lastly, the user can force the old behavior by
specifying MATERIALIZED; this would mainly be useful when the query had
deliberately been employing WITH as an optimization fence to prevent a
poor choice of plan.
 
Andreas Karlsson, Andrew Gierth, David Fetter
 
Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk

Continue reading Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.

Converting list of integers into list of ranges

Yesterday someone on irc asked:

i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?

There was no further discussion, aside from me saying

sure you can. not trivial task, but possible.
you'd need window functions.

but it got me thinking …

Continue reading Converting list of integers into list of ranges

Returning data in multiple columns

I was working today on some updates to client database. While doing it, I figured it would be simpler if I saw all “codenames" and ids of rows from dictionary table – not so big. But it was bigger than my screen – I have only 90 lines of text on screen, and there were ~ 200 rows of data in the table. So I started thinking – how to show this (codename, id) into more than one column, in psql.

Continue reading Returning data in multiple columns

Anonymize CTE names on explain.depesz.com

A colleague recently let me know that anonymization in explain.depesz.com doesn't handle CTE names. For example, in plan:

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 CTE Scan ON some_name  (cost=0.01..0.03 ROWS=1 width=8) (actual TIME=0.027..0.028 ROWS=1 loops=1)
   CTE some_name
     ->  RESULT  (cost=0.00..0.01 ROWS=1 width=0) (actual TIME=0.023..0.023 ROWS=1 loops=1)
 Planning TIME: 0.217 ms
 Execution TIME: 0.124 ms
(5 ROWS)

“some_name" was kept even if anonymization was turned on.

Now it's fixed, and new code should properly anonymize all CTE names. Of course the change is live on explain.depesz.com too.

Getting count of distinct elements, per group, in PostgreSQL.

So, couple of days ago, some guy, from Periscope company wrote a blogpost about getting number of distinct elements, per group, faster using subqueries.

This was then submitted to Hacker News and r/Programming on Reddit.

Then, the original authors submitted second blogpost comparing speed between four different DB engines. Which, in turn, was also commented on Reddit.

I found the numbers presented by Periscope (as their improvement) as not that great.

Unfortunately – their blog doesn't allow for comments, so I decided to test it, and write on my own blog, what I can find about it.

Continue reading Getting count of distinct elements, per group, in PostgreSQL.

Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax

On 1st of February, Peter Eisentraut committed patch:

Add CREATE RECURSIVE VIEW syntax
 
This is specified in the SQL standard.  The CREATE RECURSIVE VIEW
specification is transformed into a normal CREATE VIEW statement with a
WITH RECURSIVE clause.
 
reviewed by Abhijit Menon-Sen and Stephen Frost

Continue reading Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax