I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.
Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:
- station – 170 distinct values
- channel – generally 1-3 channels per station
And then we want to run:
array_agg(DISTINCT (channel)) AS channels
Which, on Israel's (original poster) machine took ~ 5 minutes.
And this is with index on on data (station, channel).
Can we do better?
Continue reading Using recursive queries to get distinct elements from table
When you see plan on explain.depesz.com, one of the columns visible is exclusive.
In principle it's simple – take time of current node, subtract times of all sub nodes, and you get how much time was spent in this node alone.
Reality is not that simple. Two things especially make the calculations complicated. CTEs, and InitPlans.
Now, with the just pushed change to Pg::Explain (plan parsing library), and site itself, some of the bad calculations are gone.
Let's see some examples:
Continue reading New changes on explain.depesz.com – fixed calculations of exclusive times
On 16th of February 2019, Tom Lane committed patch:
Allow user control of CTE materialization, and change the default behavior.
Historically we've always materialized the full output of a CTE query,
treating WITH as an optimization fence (so that, for example, restrictions
from the outer query cannot be pushed into it). This is appropriate when
the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE
query is non-recursive and side-effect-free, there's no hazard of changing
the query results by pushing restrictions down.
Another argument for materialization is that it can avoid duplicate
computation of an expensive WITH query --- but that only applies if
the WITH query is called more than once in the outer query. Even then
it could still be a net loss, if each call has restrictions that
would allow just a small part of the WITH query to be computed.
Hence, let's change the behavior for WITH queries that are non-recursive
and side-effect-free. By default, we will inline them into the outer
query (removing the optimization fence) if they are called just once.
If they are called more than once, we will keep the old behavior by
default, but the user can override this and force inlining by specifying
NOT MATERIALIZED. Lastly, the user can force the old behavior by
specifying MATERIALIZED; this would mainly be useful when the query had
deliberately been employing WITH as an optimization fence to prevent a
poor choice of plan.
Andreas Karlsson, Andrew Gierth, David Fetter
Continue reading Waiting for PostgreSQL 12 – Allow user control of CTE materialization, and change the default behavior.
Yesterday someone on irc asked:
i've a query that returns sequential numbers with gaps (generate_series + join) and my question is: can is somehow construct ranges out of the returned values? sort of range_agg or something?
There was no further discussion, aside from me saying
sure you can. not trivial task, but possible.
you'd need window functions.
but it got me thinking …
Continue reading Converting list of integers into list of ranges
I was working today on some updates to client database. While doing it, I figured it would be simpler if I saw all “codenames" and ids of rows from dictionary table – not so big. But it was bigger than my screen – I have only 90 lines of text on screen, and there were ~ 200 rows of data in the table. So I started thinking – how to show this (codename, id) into more than one column, in psql.
Continue reading Returning data in multiple columns
A colleague recently let me know that anonymization in explain.depesz.com doesn't handle CTE names. For example, in plan:
CTE Scan ON some_name (cost=0.01..0.03 ROWS=1 width=8) (actual TIME=0.027..0.028 ROWS=1 loops=1)
-> RESULT (cost=0.00..0.01 ROWS=1 width=0) (actual TIME=0.023..0.023 ROWS=1 loops=1)
Planning TIME: 0.217 ms
Execution TIME: 0.124 ms
“some_name" was kept even if anonymization was turned on.
Now it's fixed, and new code should properly anonymize all CTE names. Of course the change is live on explain.depesz.com too.
So, couple of days ago, some guy, from Periscope company wrote a blogpost about getting number of distinct elements, per group, faster using subqueries.
This was then submitted to Hacker News and r/Programming on Reddit.
Then, the original authors submitted second blogpost comparing speed between four different DB engines. Which, in turn, was also commented on Reddit.
I found the numbers presented by Periscope (as their improvement) as not that great.
Unfortunately – their blog doesn't allow for comments, so I decided to test it, and write on my own blog, what I can find about it.
Continue reading Getting count of distinct elements, per group, in PostgreSQL.
In this, hopefully 2nd to last, post in the series, I will cover the rest of usually happening operations that you can see in your explain outputs.
Continue reading Explaining the unexplainable – part 4
On 1st of February, Peter Eisentraut committed patch:
Add CREATE RECURSIVE VIEW syntax
This is specified in the SQL standard. The CREATE RECURSIVE VIEW
specification is transformed into a normal CREATE VIEW statement with a
WITH RECURSIVE clause.
reviewed by Abhijit Menon-Sen and Stephen Frost
Continue reading Waiting for 9.3 – Add CREATE RECURSIVE VIEW syntax
In PostgreSQL 8.4 we got CTE – Common Table Expressions. Since then we have this great tool available, but apparently for some people it's still black magic. CuTE, but still magic. I'll try to make it a bit less magical, and more understandable.
Continue reading CuTE overload