May 28th, 2010 by depesz | Tags: , , , | 17 comments »
Did it help? If yes - maybe you can help me?

Today, there was this one person on IRC, which asked question and provided some data. While working on helping him (her?), I noticed some things, that bugged me before in other cases, but this time i decided to write about it – it's kind of rant, and if you (the reader) are the person that I'm basing my example on – please do not feel “punished" – it just so happens, that you exhibited some things that make helping others more difficult than it could be – so: you're not special, although I would really prefer if you were 🙂

First of all – if you're looking for help on IRC – check if somebody talks back to you. Get some IRC client that will show a popup when somebody mentions your name/nick so you will not overlook it.

I not always have a lot of time to help others, and in 90% of cases – I need to ask some clarification questions to be able to suggest solution – waiting for answer makes me loose my time, and greatly decreases your chances of getting help – I simply might run out of time, and when you'll answer my question – I will no longer be available to provide help.

Second. This is the query we were provided with (some anonymization done, again – I'm not singling out this person – it's unfortunately a very common situation):

select
decode(a.category,null,b.category,a.category) "category",b.par "Total object Request",b.ps "objects Served",
b.ar "Total sushi Request", a.sushis "sushis Served",
round((decode(b.ar,0,0,(b.ar - cast(decode(a.sushis,null,0,a.sushis) as numeric))/b.ar))*100,3) "USR",
a.clk "points", decode(b.ps,0,0,round((a.clk/b.ps)*100,3)) "CTR",a.cpc "CPC", a.tc "Cost",
decode(b.ps,0,0,cast((a.tc/b.ps)*1000 as numeric(8,3))) "effectcost"
from
(select
decode(b.category,null,'N/A',b.category) category, sum(doughnuts) sushis, sum(points) clk,
round(cast(sum(total_cost) as numeric),3) tc,
decode(sum(points),0,0,round(cast((sum(total_cost)/sum(points)) as numeric),3)) cpc
from
daily_city_dealer_summary a, category_dealer_map b
where
a.category_dealer_id=b.category_dealer_id  and   created_day between '2010-05-01' and '2010-05-25'
group by b.category) a     full outer join
 
(select
decode(a.category,null,'N/A', decode(a.category,'-','World-Remaining countries',a.category)) category,
sum(valid_object_request) par, sum(valid_sushi_request) ar, sum(object_doughnuts) ps
from
traffic_hit a
where
request_date between '2010-05-01' and '2010-05-25'
group by a.category)  b
on lower(a.category)=lower(b.category)
order by 4 desc;

How do you like it? Well. I asked for reformatted/readable version. This is what I got:

select decode(a.category,null,b.category,a.category) "category",b.par "Total object Request",b.ps "objects Served",
b.ar "Total sushi Request", a.sushis "sushis Served", round((decode(b.ar,0,0,(b.ar - cast(decode(a.sushis,null,0,a.sushis) as numeric))/b.ar))*100,3) "USR",
a.clk "points", decode(b.ps,0,0,round((a.clk/b.ps)*100,3)) "CTR",a.cpc "CPC", a.tc "Cost", decode(b.ps,0,0,cast((a.tc/b.ps)*1000 as numeric(8,3))) "effectcost"
from
    (select decode(b.category,null,'N/A',b.category) category, sum(doughnuts) sushis, sum(points) clk, round(cast(sum(total_cost) as numeric),3) tc,
        decode(sum(points),0,0,round(cast((sum(total_cost)/sum(points)) as numeric),3)) cpc 
    from
    daily_city_dealer_summary a, category_dealer_map b
    where
    a.category_dealer_id=b.category_dealer_id  and   created_day between '2010-05-01' and '2010-05-25'
    group by b.category) a 
full outer join
    (select decode(a.category,null,'N/A', decode(a.category,'-','World-Remaining countries',a.category)) category,
        sum(valid_object_request) par, sum(valid_sushi_request) ar, sum(object_doughnuts) ps
    from
    traffic_hit a
    where
    request_date between '2010-05-01' and '2010-05-25' group by a.category)  b
on lower(a.category)=lower(b.category)
order by 4 desc;

Well, it's better. Not much, but better.

Third. Let's analyze the SQL above, and note some things:

  • developer was using table aliases. good.
  • developer was using only “a" and “b" aliases, which in no way relate to tables, and are reused within the same query. very bad
  • developer used columns without prefixing them with alias: sum(points) – which table it's from: daily_city_dealer_summary or category_dealer_map?

2nd error is fortunately rare – aliases are usually abbreviations of table names, or parts of it – for example “map" for “category_dealer_map" (or “cdm").

But 3rd thing – i.e. not using aliases for all column is so common, it's practically standard. And personally – it really bugs me. This makes the query unreadable without knowing \d of the tables in question, and while I understand that developer who wrote it, knows them by heart, and it's obvious to him that field “x" makes sense only in table “y", and not in table “d", but that's him. And I'm stupid, and I don't know it (especially if names of the tables/columns are written in some esoteric (for me) language).

I know that there are some people that are reading my posts (really, and thank you). Can you please try to make sure that your queries are readable without intimate knowledge of your schema? Of course there are some things that will require knowing the schema anyway, but please – at least try to make reading possible.

When asking for help in case of slow query – there are some cases when we can just take a look at query and know why it's slow. There are some cases when we can look at plan, and tell you what's wrong. But usually, to be able to sensibly help we need both pieces of this information – i.e. explain analyze output and the query and used version of PostgreSQL. Usually it is also helpful to provide \d of all tables that are being used by the query. All. Not 2 out of 3, because the one left out is nearly irrelevant (this happened today as well).

And finally – I understand that some people prefer to use pgadmin, phppgadmin or any other gui for PostgreSQL. I'm fine with it. But please – at least once run psql, and read \h output. In a lot of cases, we need table definitions, and it's easier to write: “show me \d your_table" than: “show me whole definition of your table, including indexes, foreign keys, checks, datatypes, defaults and constraints“. Besides – you can get this data trivially from psql, and not so trivially from GUI programs – which are happy to show you the data, but it's not copy-pasteable.


Based on comment from Dan, here is how I envision “ideal, “help-friendly” version of this sushi-related query". Please remember that indentation is very persnal stuff, so you might prefer other way – this is what matches my tastes:

SELECT
    DECODE(a.category,NULL,b.category,a.category)                                                   "category",
    b.par                                                                                           "Total object Request",
    b.ps                                                                                            "objects Served",
    b.ar                                                                                            "Total sushi Request",
    a.sushis                                                                                        "sushis Served",
    ROUND((DECODE(b.ar,0,0,(b.ar - CAST(DECODE(a.sushis,NULL,0,a.sushis) AS numeric))/b.ar))*100,3) "USR",
    a.clk                                                                                           "points",
    DECODE(b.ps,0,0,ROUND((a.clk/b.ps)*100,3))                                                      "CTR",
    a.cpc                                                                                           "CPC",
    a.tc                                                                                            "Cost",
    DECODE(b.ps,0,0,CAST((a.tc/b.ps)*1000 AS numeric(8,3)))                                         "effectcost"
FROM
    (
        SELECT
            DECODE(b.category,NULL,'N/A',b.category) category,
            SUM(doughnuts) sushis,
            SUM(points) clk,
            ROUND(CAST(SUM(total_cost) AS numeric),3) tc,
            DECODE(SUM(points),0,0,ROUND(CAST((SUM(total_cost)/SUM(points)) AS numeric),3)) cpc
        FROM
            daily_city_dealer_summary a,
            category_dealer_map b
        WHERE
            a.category_dealer_id=b.category_dealer_id
            AND created_day BETWEEN '2010-05-01' AND '2010-05-25'
        GROUP BY
            b.category
    ) a
    full outer join
    (
        SELECT
            DECODE(a.category,NULL,'N/A', DECODE(a.category,'-','World-Remaining countries',a.category)) category,
            SUM(valid_object_request) par,
            SUM(valid_sushi_request) ar,
            SUM(object_doughnuts) ps
        FROM
            traffic_hit a
        WHERE
            request_date BETWEEN '2010-05-01' AND '2010-05-25'
        GROUP BY
            a.category
    )  b
    ON LOWER(a.category)=LOWER(b.category)
ORDER BY 4 DESC;

The only change to original query is modified whitespaces.

Next step would be:

  • addition of missing table aliases for columns (for example: points or doughnuts in first subselect)
  • changing table aliases to something related to table names, and unique within query
  • removal of join condition from first subselect from “WHERE" part, and converting it to normal JOIN

But these are next steps. And just doing the first step is often enough.

  1. 17 comments

  2. # Tuxie
    May 28, 2010

    You really should promote your wonderful explain parser in this post!

    https://explain.depesz.com/

  3. May 28, 2010

    @Tuxie:
    I think I promoted it already enough 🙂 And it’s linked in sidebar.

  4. # Anonymous
    May 28, 2010

    Mmmm.. I think I’ll have sushi.

  5. # Andrew Dunstan
    May 28, 2010

    Maybe you should start a wiki page on how to get help successfully on IRC.

  6. May 28, 2010

    @Andrew:
    I’m not sure if I’m right person to do so. I just wrote it to have somewhere to point people to next time I will see query in 10 lines, with no indentation, and each line 150 character long. Or query which makes me guess where the data comes from.

    Side note: perhaps I should write SQL prettifier?

  7. # sebpa
    May 28, 2010

    this one is pretty cool 😉

    http://sqlformat.appspot.com/

  8. May 28, 2010

    It seems that your points are also valid for posting on the mailing lists. The only difference is that answers can come hours or even days after the initial inquiry.

  9. why is using table aliases good? I personally prefer not to use them (unless the table name really is ridiculously long)

  10. oh and when responding to someone if it’s been more than 1 min please use their nick. I’ve come back to a channel hours later to find someone responded 30min later (not a problem in #pg) and I didn’t see it for a while because they didn’t bother to give me a highlight.

  11. May 28, 2010

    @Caleb:
    Aliases protect you against accidental turning simple typo into correlated subquery.
    Also – they make it *possible* to understand which column comes from where in case where you’re using joins. Of course you could prefix all columns with table name, but it just looks weird to me. But – it’s better than no prefixing at all.

  12. May 28, 2010

    @Caleb:
    Heh – lack of nick made me miss somebody’s reply/question to me a lot of times. It’s good to know I’m no the only one.

  13. May 28, 2010

    Please post what the ideal, “help-friendly” version of this sushi-related query would be.

    If I can stand on my soapbox for a bit about an issue which shows up in much of the programming/database help online, in the Postgresql-community, among many, many others… Without the “right way” shown, a post like this really is just “preaching to the choir” which already knows the “right” way, if they’re not the ones offering the help in the first place. Most everyone reading this blog probably knows exactly how to fix the query (based on your 2 points: avoid “a” “b” table aliases, column aliases, etc) to make providing assistance easier, but new users (in other words, the people still using “a” and “b” table aliases) really need a hands-on example of the solution. So much of the help content (or help pointers — in the sense of providing a link to the relevant section in the official documentation) can prove to be ultimately only superficial help for new users.

    Unfortunately, creating this “hands-on example” takes much more time than just going through some rules-of-thumb (or links to the official documentation); there’s something to be said for the “give a man a fish” vs. “teach a man to fish” philosophy (keeping the sushi theme), but giving someone a fishing rod, a can of worms, and some sunblock isn’t really teaching him how to fish, either. (in fact, stretching this metaphor to ridiculous proportions, maybe he was using the same lowly creek-fishing rod to catch sharks because he was left to his own devices — like continuing to use “a” and “b” aliases because that’s a common example for showing the use of aliases).

    Before I step off the soapbox, I’ll also mention that programming-related documentation which relies on the “foo” and “bar” based example system is no good, and I’m sure it’s used only because it’s easy to write and does not require any critical thinking or, at least, real-world understanding of the underlying functionality or its understanding by future users. So, all that said, I do greatly appreciate that Depesz’s consistently interesting “Waiting for [new version]…” entries always have thoughtful, real examples. His entries are not what amounts to a mindless recap of the syntax, but with “foo,” “bar,” and “widget” in place of the variables or parameters.

    This is absolutely not Postgres-community bashing. I’ve followed the community (mainly through entries aggregated by the “Planet PostgreSQL” RSS feed) for the past 3 years, and I’m not sure I’ve ever seen the ugly, unfortunately typical to free-software-communities attitude of “it’s open source, so if you don’t like [the omission/functionality/usefulness of a particular feature], you’re welcome to submit a patch.” Discussions in comments are always constructive (I’m thinking of the varchar vs. text debates last summer). I don’t follow the mailing list, so maybe that’s where it gets ugly :)…

    I’ve received help on IRC (and I think I wasn’t used as a bad example afterwards!). I just mean to say I hope that the community keeps in mind the users who are a step above the overly (though maybe necessarily) simple examples in the official documentation, but aren’t quite well versed enough to know not to use “a” or “b” aliases when the usage isn’t a “foo/bar” trivial case. (I’m not trying to rationalize laziness by those requesting free help, so I’m assuming the “sushi” person’s difficult-to-follow question can be chalked up to naiveté, not ignorance.)

  14. May 28, 2010

    @Dan:
    updated – is that better now?

  15. @Depesz I generally just use full table names unless it’s a 1 off. auto-completion make this easy enough in 1 offs too.

  16. May 29, 2010

    @Caleb:
    This is perfectly acceptable. The only reason why I don’t use it is because aliases are shorter and easier to read.

    For example: when using table names like ‘category_custom_fields_definitions’ and ‘category_custom_fields_values’ – you have to “read” first 3 words which don’t give you any information. That’s why I alias them to cfd/cfv or defs/vals.

  17. May 29, 2010

    @SebPa:
    Tried this site on sql from this post. Doesn’t look much better (at least for me).

  18. # Dan
    May 30, 2010

    @depesz Perfect. Thanks a lot.

Leave a comment