New SQL pretty printer – based on parsing, and not regexps

For a long time I was looking for SQL pretty printer.

Some queries that I had to deal with, over the years, were just insane to read, like this:

SELECT
DECODE(a.category,NULL,b.category,a.category) "category",b.par "Total object Request",b.ps "objects Served",
b.ar "Total sushi Request", a.sushis "sushis Served",
ROUND((DECODE(b.ar,0,0,(b.ar - CAST(DECODE(a.sushis,NULL,0,a.sushis) AS NUMERIC))/b.ar))*100,3) "USR",
a.clk "points", DECODE(b.ps,0,0,ROUND((a.clk/b.ps)*100,3)) "CTR",a.cpc "CPC", a.tc "Cost",
DECODE(b.ps,0,0,CAST((a.tc/b.ps)*1000 AS NUMERIC(8,3))) "effectcost"
FROM
(SELECT
DECODE(b.category,NULL,'N/A',b.category) category, SUM(doughnuts) sushis, SUM(points) clk,
ROUND(CAST(SUM(total_cost) AS NUMERIC),3) tc,
DECODE(SUM(points),0,0,ROUND(CAST((SUM(total_cost)/SUM(points)) AS NUMERIC),3)) cpc
FROM
daily_city_dealer_summary a, category_dealer_map b
WHERE
a.category_dealer_id=b.category_dealer_id  AND   created_day BETWEEN '2010-05-01' AND '2010-05-25'
GROUP BY b.category) a     FULL OUTER JOIN
 
(SELECT
DECODE(a.category,NULL,'N/A', DECODE(a.category,'-','World-Remaining countries',a.category)) category,
SUM(valid_object_request) par, SUM(valid_sushi_request) ar, SUM(object_doughnuts) ps
FROM
traffic_hit a
WHERE
request_date BETWEEN '2010-05-01' AND '2010-05-25'
GROUP BY a.category)  b
ON LOWER(a.category)=LOWER(b.category)
ORDER BY 4 DESC;

For years the best (in my opinion) way to format queries was pg_format program from pgFormatter project.

It is really impressive, but has one important drawback – reformatting is done using applying some rather complex regexps, and custom made tokenizers.

I dreamed of something that would actually parse the query, and based on parse tree rebuild the query using some consistent logic.

There is libpg_query library, and based on it, Ruby GEM.

This code is using Pg parser extracted from sources. And parses arbitrary queries to relatively easy to work with tree.

Based on this, I wrote parser microservice, and then pretty printer in perl.

Today I changed explain.depesz.com and paste.depesz.com to use this new code for pretty printing.

If some data will not parse, or my code doesn't support it (yet), both sites will fall back to using pgFormatter.

Additionally, I added a way to pretty print using paste.depesz.com without even saving data on paste.depesz.com site, using /prettify url.

You can use it like this:

=$ echo 'select a from b' | curl -s -XPOST --data-urlencode "q@-" https://paste.depesz.com/prettify
SELECT
    a
FROM
    b;
-- Formatted by Pg::SQL::PrettyPrinter

You can write yourself simple script, like /usr/local/bin/prettify.sh with content:

#!/usr/bin/env bash
 
curl -s -XPOST --data-urlencode "q@${1:--}" https://paste.depesz.com/prettify

And then you can use it for files too:

=$ prettify.sh z.sql 
SELECT
    123;
-- Formatted by Pg::SQL::PrettyPrinter

All prettified SQLs (via /prettify, or on explain.depesz.com, or on
paste.depesz.com, will contain, in last line, comment that says which prettifying library was used.

So, if your query isn't prettified by Pg::SQL::PrettyPrinter, and you think it should. Or it is, but in incorrect way, please reach to me by filling an issue.

For now only DML queries are handled (select, insert, update, delete), but I will be adding DDL in future.

Hope you'll find it useful.

Also, I'd like to thank Alexandre Felipe (bob) and Andreas ‘ads' Scherbaum for prodding me to get it done. Probably wouldn't have done it without you 🙂

Finally, as a quick example, the query from beginning of post, after formatting with Pg-SQL-PrettyPrinter looks like this:

SELECT
    decode(
        a.category,
        NULL,
        b.category,
        a.category
    ) AS category,
    b.par AS "Total object Request",
    b.ps AS "objects Served",
    b.ar AS "Total sushi Request",
    a.sushis AS "sushis Served",
    round(
        decode(
            b.ar,
            0,
            0,
            ( b.ar - decode( a.sushis, NULL, 0, a.sushis )::pg_catalog.numeric ) / b.ar
        ) * 100,
        3
    ) AS "USR",
    a.clk AS points,
    decode(
        b.ps,
        0,
        0,
        round(
            ( a.clk / b.ps ) * 100,
            3
        )
    ) AS "CTR",
    a.cpc AS "CPC",
    a.tc AS "Cost",
    decode(
        b.ps,
        0,
        0,
        ( a.tc / b.ps ) * 1000::pg_catalog.numeric( 8, 3 )
    ) AS effectcost
FROM
    (
        SELECT
            decode(
                b.category,
                NULL,
                'N/A',
                b.category
            ) AS category,
            SUM( doughnuts ) AS sushis,
            SUM( points ) AS clk,
            round(
                SUM( total_cost )::pg_catalog.numeric,
                3
            ) AS tc,
            decode(
                SUM( points ),
                0,
                0,
                round(
                    SUM( total_cost ) / SUM( points )::pg_catalog.numeric,
                    3
                )
            ) AS cpc
        FROM
            daily_city_dealer_summary AS a,
            category_dealer_map AS b
        WHERE
            a.category_dealer_id = b.category_dealer_id AND
            created_day BETWEEN '2010-05-01' AND '2010-05-25'
        GROUP BY
            b.category
    ) AS a
    FULL JOIN (
        SELECT
            decode(
                a.category,
                NULL,
                'N/A',
                decode(
                    a.category,
                    '-',
                    'World-Remaining countries',
                    a.category
                )
            ) AS category,
            SUM( valid_object_request ) AS par,
            SUM( valid_sushi_request ) AS ar,
            SUM( object_doughnuts ) AS ps
        FROM
            traffic_hit AS a
        WHERE
            request_date BETWEEN '2010-05-01' AND '2010-05-25'
        GROUP BY
            a.category
    ) AS b ON LOWER( a.category ) = LOWER( b.category )
ORDER BY
    4 DESC;
-- Formatted by Pg::SQL::PrettyPrinter

Perhaps not ideal, but it's a change.

4 thoughts on “New SQL pretty printer – based on parsing, and not regexps”

  1. I guess formatting is always a matter of personal preference

    The case statement I like much better with the new formatter, more on one line that belongs together. However, I personally prefer decode on one line (or what belongs logically together) and not to be scattered among this many lines.

    but now I have to search through the pg documentation, I thought decode in this context does not exist in Postgres, only Oracle.

  2. @stefan:

    Well, while it is personal preference, the fact that it rebuilds the query, as opposed to reformatting, means that it will be WAY easier to configure it.

    Also, I’m not sure what you mean about decode – pg doesn’t have decode function as shown in the example, it must have been added by someone that shared original query.

  3. Hello,
    about the formmating of first query, did you used pgFormatter ? What the needed chnages to have the reserved keys in green and the text in red ?

    Thanks in advance.

  4. @Hugh24:

    Sorry, I don’t quite understand.

    pgFormatter adds some whitespace to make the query more readable.

    Colors are added by some plugin to wordpress.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.