August 28th, 2009 by depesz | Tags: , , , , | 2 comments »
Did it help? If yes - maybe you can help me?

After very long discussions, on 10th of August, Tom Lane committed patch by Robert Haas:

Log Message:
-----------
Extend EXPLAIN to support output in XML or JSON format.
 
There are probably still some adjustments to be made in the details
of the output, but this gets the basic structure in place.
 
Robert Haas

This is a very important news. Every once in a while everybody has to read EXPLAIN output. Since not everything in them is obvious, there have been created tools (
like explain-analyze.info or explain.depesz.com ) to help with the task of understanding what's going on.

The problem was always, that while EXPLAIN output is perfectly (for some values of “perfect") readable for humans, parsing it is not so trivial, and there are always cases when parsing goes wrong.

For this very reason, Robert created his patch.

So, how does it work?

As you perhaps know, EXPLAIN could be invoked in following ways:

  • EXPLAIN query
  • EXPLAIN ANALYZE query
  • EXPLAIN VERBOSE query
  • EXPLAIN ANALYZE VERBOSE query

Problem with this approach is that any change (like adding options) to explain call required extra work (don't ask me – it's way above my understanding). So, to add machine readable formats, first there had to be a way to add new options to EXPLAIN call in simpler way.

Here come generic explain options.

Now, you can call EXPLAIN like this:

EXPLAIN ( option value, option value, ... ) query

Old way (EXPLAIN ANALYZE query) is preserved but newer extensions will be (as I understand) available only using this (option value) method.

For example – previous 4 calls can be directly converted to new way like this:

  • EXPLAIN query -> EXPLAIN query
  • EXPLAIN ANALYZE query -> EXPLAIN ( analyze on ) query
  • EXPLAIN VERBOSE query -> EXPLAIN ( verbose on ) query
  • EXPLAIN ANALYZE VERBOSE query -> EXPLAIN ( analyze on, verbose on ) query

So, what new capabilities are there?

There are 2 (previously not available) options COSTS and FORMAT.

COSTS are simpler. Compare these 4 plans:

# explain select * from pg_class where relname = 'xxx';
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.00..8.27 rows=1 width=185)
   Index Cond: (relname = 'xxx'::name)
(2 rows)
# explain ( costs off ) select * from pg_class where relname = 'xxx';
                       QUERY PLAN
---------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class
   Index Cond: (relname = 'xxx'::name)
(2 rows)
# explain ( analyze on ) select * from pg_class where relname = 'xxx';
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.00..8.27 rows=1 width=185) (actual time=0.024..0.024 rows=0 loops=1)
   Index Cond: (relname = 'xxx'::name)
 Total runtime: 0.102 ms
(3 rows)
# explain ( analyze on, costs off ) select * from pg_class where relname = 'xxx';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class (actual time=0.023..0.023 rows=0 loops=1)
   Index Cond: (relname = 'xxx'::name)
 Total runtime: 0.103 ms
(3 rows)

As you can see there are no estimated costs attached – main purpose of this option is to provide explain output that is comparable between machines and configs.

And now, to the main part of the patch – FORMAT option.

At the moment there are 2 possible formats: XML and JSON. How do they look?

# explain ( analyze on, format xml ) select * from pg_class where relname = 'xxx';
                        QUERY PLAN
-----------------------------------------------------------
 <explain xmlns="http://www.postgresql.org/2009/explain">
   <Query>
     <Plan>
       <Node-Type>Index Scan</Node-Type>
       <Scan-Direction>Forward</Scan-Direction>
       <Index-Name>pg_class_relname_nsp_index</Index-Name>
       <Relation-Name>pg_class</Relation-Name>
       <Alias>pg_class</Alias>
       <Startup-Cost>0.00</Startup-Cost>
       <Total-Cost>8.27</Total-Cost>
       <Plan-Rows>1</Plan-Rows>
       <Plan-Width>185</Plan-Width>
       <Actual-Startup-Time>0.015</Actual-Startup-Time>
       <Actual-Total-Time>0.015</Actual-Total-Time>
       <Actual-Rows>0</Actual-Rows>
       <Actual-Loops>1</Actual-Loops>
       <Index-Cond>(relname = 'xxx'::name)</Index-Cond>
     </Plan>
     <Triggers>
     </Triggers>
     <Total-Runtime>0.130</Total-Runtime>
   </Query>
 </explain>
(1 row)
# explain ( analyze on, format json ) select * from pg_class where relname = 'xxx';
                    QUERY PLAN
---------------------------------------------------
 [
   {
     "Plan": {
       "Node Type": "Index Scan",
       "Scan Direction": "Forward",
       "Index Name": "pg_class_relname_nsp_index",
       "Relation Name": "pg_class",
       "Alias": "pg_class",
       "Startup Cost": 0.00,
       "Total Cost": 8.27,
       "Plan Rows": 1,
       "Plan Width": 185,
       "Actual Startup Time": 0.023,
       "Actual Total Time": 0.023,
       "Actual Rows": 0,
       "Actual Loops": 1,
       "Index Cond": "(relname = 'xxx'::name)"
     },
     "Triggers": [
     ],
     "Total Runtime": 0.102
   }
 ]
(1 row)

Nice. Finally something that doesn't require (faulty) magic with regexps or writing custom parsers just to get some data out of explain analyze.

Of course there is a drawback – explain analyze of above query has 481 bytes in standard format, 667 in JSON and 932 in XML – which doesn't look like much until you'll see a 2MB long explain output.

But, all in all, it's a great addition.

  1. 2 comments

  2. Aug 29, 2009

    One problem we’ve seen in the past on high performance oltp systems is that the overhead of running explain analyze was significant enough that you couldn’t adequately troubleshoot very fast queries (ie. the query would run faster by itself than it would through explain analyze). Any ideas on what the additional overhead, if any, of these new formats looks like?

  3. Sep 15, 2009

    @Robert:
    no idea, but I don’t think it’s much – after all it’s just text formatting.

Sorry, comments for this post are disabled.