Waiting for PostgreSQL 17 – Invent SERIALIZE option for EXPLAIN.

On 3rd of April 2024, Tom Lane committed patch:

Invent SERIALIZE option for EXPLAIN.
 
EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about
the volume of data emitted by a query, as well as the time taken
to convert the data to the on-the-wire format.  Previously there
was no way to investigate this without actually sending the data
to the client, in which case network transmission costs might
swamp what you wanted to see.  In particular this feature allows
investigating the costs of de-TOASTing compressed or out-of-line
data during formatting.
 
Stepan Rutz and Matthias van de Meent,
reviewed by Tomas Vondra and myself
 
Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de

This will be HUGE.

Let's look at simple example, database for explain.depesz.com. Let's see how long it would take to get all plans:

=$ EXPLAIN (analyze) SELECT * FROM simple;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan ON simple  (cost=0.00..210108.80 ROWS=4466880 width=267) (actual TIME=0.016..711.977 ROWS=1614584 loops=1)
 Planning TIME: 0.198 ms
 Execution TIME: 749.749 ms
(3 ROWS)

Pretty OK. But it's not full picture. The thing is that while it did run most of query, it didn't really get the data to memory from toasted columns (and potentially some other places).

Now, with this new SERIALIZE option I can:

=$ EXPLAIN (analyze, serialize) SELECT * FROM simple;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan ON simple  (cost=0.00..210108.80 ROWS=4466880 width=267) (actual TIME=0.011..399.249 ROWS=1614584 loops=1)
 Planning TIME: 0.028 ms
 Serialization: TIME=19479.433 ms  output=28074610kB  format=text
 Execution TIME: 19961.576 ms
(4 ROWS)

Please note that we got new line, and much higher total execution cost. Getting data from table was simple to do in ~ 750ms. But getting data from toast, and generating output to send to client – it took almost 20 seconds.

Plus, please note that explain shows that there were rows=1614584 with width=267 – which would be ~ 410MB.

But, serialization showed that full output was output=28074610kB – 26GB! Over 60 times more.

This is great. Very good news to anyone working on query optimization. Thanks a lot to everyone that worked on this.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.