January 4th, 2009 by depesz | Tags: , , , , , | 4 comments »
Did it help? If yes - maybe you can help me?

This post is basically just an reply to Josh Berkus blog post. Additionally, it refers to “SQL Coding Standards To Each His Own" by Leo Hsu and Regina Obe.

Well, I've read what Josh wrote, and I though to myself – oh my, I must be doing something wrong, as I would never use long aliases. Then I read what Leo and Regina wrote, and I regained some confidence in my choice.

But, to say so, I could simply add a comment on Josh post.

So, I'd like to add something from myself. The topic of writing maintainable queries is very close to me, as I worked with a lot of strangely written queries, did the same as Josh – spending 30 minutes to rewrite the query just to understand what it really does. I have seen queries written by people, ORMs (Josh should like them I guess, but I hate the queries written by Hibernate and Django).

During the time, I decided that my priorities for writing maintainable queries:

  1. Avoid useless typing.
  2. Use aliases for tables/views. Always. And make them sensible aliases.
  3. Indent code in some way.
  4. Avoid quotations (yes, this is why I hate Django)
  5. Use join syntax

List is not long, but that's because SQL queries are usually relatively simple as they do one thing at a time.

As for “Avoid useless typing" – long, repeated names make it (for me) more difficult to read, repeated expressions make it easier to make mistake when maintaining (changing “+1" to “+2" in 2 out of 3 places in query).

This goes for repetition of table names like:

SELECT
    search_query.id,
    search_query.criteria,
    search_query.url
FROM
    search_query
WHERE
    (search_query.url = '...')

As for aliases – imagine this situation – you rewrote above query to avoid repetition to:

SELECT
    id,
    criteria,
    url
FROM
    search_query
WHERE
    url = '...'

But then you have to join some other table (which luckily doesn't contain any columns named the same way), add a where, and add new column to be returned:

SELECT
    id,
    criteria,
    url,
    some_column
FROM
    search_query
    join some_table on id = sqid
WHERE
    url = '...'
    and some_option = '...'
ORDER BY
    ordering

Now, tell me – which table does “some_column" come from? And “criteria"? Of course – “criteria" is simple – we know it came from “search_query", but we know it only because we just modified the query.
If some poor guy will come in my place in a month, and he will see the query he will have to manually check which table given column belongs to.

This is especially important if you'd ever would like to ask for help on irc in case of “slow query" – as the situation is much different depending on which of columns (id, sqid, url, some_option and ordering) are in the same table.

As for sensible aliases – make them short, but meaninful. In this case – I would probably go with “q" (like query) and “st" for “some_table", but your preference might be different.

So, the query would look:

SELECT
    q.id,
    q.criteria,
    q.url,
    st.some_column
FROM
    search_query as q
    join some_table as st on q.id = st.sqid
WHERE
    q.url = '...'
    and st.some_option = '...'
ORDER BY
    st.ordering

Remember that by “Always use aliases" I mean also in column names – i.e. even if “url" is unique in your database – add “q.". It will not cost you much, but will give instant benefit to whoever will end up reading the query later.

“Indenting the code" is very vague, but generally – any way will do as long as you will not generate queries like “4 kb in single line" – which is, sadly, what ORMs tend to do. My preferred way of indenting is shown above – it has some drawbacks, but it works nicely for me (side note: in my long-term todo I have plans for writing SQL beautifier, but maybe you know one that works?)

Last point from my shortlist – avoid quotation. This is a safeguard against cases when you'll accidentally enter something like:

create table "Unintentionalcaps" ...

As a side benefit – whenever I see quoted names in queries, I tend to assume that they are quoted for a purpose (i.e. they contain some strange characters), and I have to type them the same way. Which adds keystrokes, and thus is not welcome. And if I'll not add them – the code will look ugly with mix of quoted and unquoted names of the same objects.

The “join syntax" point is again inspired by Django. It has nasty habit of using “WHERE" part of the query to enter table join criteria. Like this:

SELECT
    id,
    criteria,
    url,
    some_column
FROM
    search_query,
    some_table
WHERE
    url = '...'
    and some_option = '...'
    and id = sqid
ORDER BY
    ordering

Which has at the very least 2 issues:

  • it makes it unclear which conditions are used for join and which are used for data filtering
  • if you'll ever would like to change type of join (to outer join for example) you have to modify both FROM and WHERE parts of the query. Which makes it possible that you'll forget WHERE, and end up with outer join working like inner join.

And that would be all.

  1. 4 comments

  2. Jan 5, 2009

    Before Josh’s post I would have agreed 100% with you. Now, I realized, that using “meaningful” aliases can make a query much more readable.

    But “meaningful” doesn’t mean ultra long like the queries produced by Business Objects (using tablenames instead of aliases). I prefer three to a maximum of eight letter aliases.

    From my viewpoint, the alias should provide information about, how the table is used in the query and if it still fits, what it contains.

    A query using the new window functions (hint) could look like:

    SELECT (SELECT COUNT (*)
    FROM rico.subaccounts sub
    WHERE sub.account_id = main.account_id) sub_count,
    (SELECT DISTINCT FIRST_VALUE (sub_last.account_type) OVER (ORDER BY sub_last.creation_time DESC)
    FROM rico.subaccounts sub_last
    WHERE sub_last.account_id = main.account_id)) last_account_type
    FROM rico.accounts main)

    Jan

  3. # gj
    Jan 5, 2009

    not to mention that using “where” instead of join can make some queries actually slower, especially if subselects are involved.

  4. Jan 6, 2009

    I do agree with you, as the most rules I was using for a long time.
    I think that subqueries shall have good (not long) names, but for my tables, single letters or even a abbreviation will do the job.

    I would include in your rules “6. Capitalize reserved words, and only them”. For reserved words, I would say SELECT, FROM, WHERE, AS, AND, OR… this helps a lot as it breaks the view in a logical way (UPPER/lower cases).

    For SQL beautifier, have you tried Aqua Data Studio? They did have a free student version, I don’t know now, but this was a lot useful for me, as prepared statements in Java, although in code you use a StringBuilder to use some lines, the code gets to the server as only one (even debugging, getting the value from variable).

    Thanks for the advices.

  5. # alex
    Feb 2, 2009

    @Fernando: I do agree too. Capitalization of reserved words and function names, is an excellent way to highlight my own identifiers, and it helps to readability of queries.

Leave a comment