my thoughts on getting random row

this topic has been written about by many smart people – from the recent past, by greg sabino mullane and josh berkus.

they show 4 different approaches:

  1. order by random()
  2. >= random() limit 1
  3. random column
  4. random aggregate

all these approaches have their benefits and drawbacks, but i'd like to show another one (polish readers saw the approach already in january 2007, but this time i will make the code more robust).

Continue reading my thoughts on getting random row

objects in categories – counters with triggers

so there you go, you have some “categories" and some objects. for simplicity let's assume one object can be in only one category.

if this is too theoretical for you – let's assume these are “mails in folders", “photos in galleries", “posts in categories" or “auctions in categories on ebay".

everything clear? now, let's assume you want to know how many “objects" are in given “category".

most basic way to do it is:

select count(*) from objects where category = some_category;

but this method is far from optimal. now, we'll learn how to do it better.

one warning for those of you who read the rss feed – if you say “yeah, i know the code, it's simple" – ask yourself – is your code deadlock-proof?

Continue reading objects in categories – counters with triggers

better results paging in postgresql 8.2

some time ago merlin moncure wrote about one of new features of postgresql 8.2 – row-wise comparison.

i also read about it, but at first didn't find anything very useful about it – after all it doesn't give you any new functionality. any kind “(a, b, c) > (…)" can be written with standard column-based operators. so it's basically just a syntactic-sugar.

that's true. but merlin pointed me to not-so-obvious benefit – basically it allows much better paging of results. how? let's see.

Continue reading better results paging in postgresql 8.2

analyze.pgsql.logs

Bugfixes:

  1. set locale to C for sorting – otherwise sort results will not be really sorted (let's everybody “thank" locale subsystem)
  2. fix a bug that prevented *last* sql-type to be printed

Changes:

  1. add rule for normalizing so: SELECT * FROM table WHERE field = -12 will be normalized to SELECT * FROM table WHERE field = ? instead of “… field = – ?"
  2. remove extra spaces from normalized version – they did happen sometimes 🙁

SVN repo at: http://svn.depesz.com/svn/analyze.pgsql.logs/trunk

effective finding queries to optimize

let's imagine simple situation – you have postgresql server. configuration was fine-tuned, hardware is ok. yet the system is not really as fast as it should.

most common problem – slow queries.

second most common problem – fast queries, but too many of them. for example – i once saw a system which did something like this:

  • select id from table;
  • for every id do:
  • select * from table where id = ?

reason? very “interesting" orm.

now i'll show you how i deal with these kind of situations 🙂

Continue reading effective finding queries to optimize

Log Buffer #57: a Carnival of the Vanities for DBAs

A couple of days ago I got email from Dave Edwards of Pythian. Dave invited me to be guest editor of upcoming Log Buffer edition.

After some hesitation (just take a look at list of guys which were editors of previous editions, and you'll understand my shyness) I decided to give it a try. so, here it goes – 57th edition, written by depesz 🙂

First of all, we'll start with some posts about my favorite database – PostgreSQL.

Lewis Cunningham on his An Expert's Guide to Oracle Technology blog writes about new product for EnterpriseDB (commercial DB based on PostgreSQL) – GridSQL, which looks like Oracle's RAC. Check also his writeup on second blog: EnterpriseDB News.

What's interesting is that EnterpriseDB and their PR agency got lately quite controversial, which was mentioned by Joshua Drake on his blog (he also links to mailing list archives with discussions about the situation), and by Peter Eisentraut on his personal blog. this entry is very important as it goes into details about EnterpriseDB's “PostgreSQL distribution for Linux".

zillablog – blog by Robert Treat, has very interesting entry about conforming to standards, and using them as a excuse not to add new functionality.

And as a final note from our (PostgreSQL) area – Dave Page wrote new entry on his blog about setting up SSL access on Windows – apparently there were some issues with pre-built binaries of OpenSSL.

Now, let's enter the territory of “the other" opensource database – MySQL:

Biggest news of the week was change in policies and availability of MySQL products – community server and enterprise server. it all started with Kaj Arnö's post on his “MySQL AB VP Community Whereabouts in a Nutshell".

It generated a lot of followups, including Mike Kruckenberg on his blog, Greg's Postgres Stuff by Greg Sabino Mullane, Lukas Kahwe Smith's Poo-Tee-Weet and Peter Zaitsev on Mysql Performance Blog.

Of course Kaj Arnö had to respond, so there went another post on his blog.

Journal of Patrick Galbraith informs us about his new addition to mysqldump – options specifically designed to simplify dumping replicated databases.

Charlie Cahoon's MySQL – Summer of Code tells us how to use global variables in MySQL Proxy. To be honest I haven't heard about it (MySQL Proxy) before, but now, after reading his post I read more about it, and the software looks great. Something like pgpool (for those of you who know PostgreSQL), but with more functionality.

On the the possible functionalities of Mysql Proxy is using this (together with Charlie's code) to transform it into automatic regression tests generator. Details are shown on Giuseppe Maxia's data charmer – it looks really cool.

Johan Andersson and Jimmy Guerrero both tell us about web seminar (and white papers) about MySQL Cluster. Unfortunately if you're in America – it's too late, your webinar was on 8th, but you can still catch webinar directed at EMEA users on 15th.

While we're at learning – Jay Pipes presents materials from his workshop about tuning mysql queries – a must read for every MySQL DBA.

On his diary, Erik Wetterberg, wrote about generating XML content directly from MySQL. Check also comment to this post, as it contains link to valuable resource .

Sergey Zhuravlev on Xaprb wrote about new extension for MySQL replication – slave delay. Interesting idea with some obvious, and some not-so-obvious uses.

At the end of MySQL section, I couldn't help but write about two posts highlighting 2 interesting bugs in MySQL code.

First is Vadim Tkachenko on MySQL performance blog which writes about issues with query cache while dealing with column-based privileges.

The other post highlights what looks to me as quite serious bug. It's Parvesh Garg's post on Optim MySQL, and it's about InnoDB engine repeating (in some situations) auto_increment values.

Oracle is not the database I use, but 2 posts got my attention:

Mark Brunelli in Eye on Oracle: a searchoracle.com blog let us know that white papers for newest Oracle (11g) are available for download.

Second post made me sad that we don't have such a thing in PostgreSQL. It's on Coskan's Approach to Oracle by Coskan Gundogar and tells about permanent (guaranteed) restore points. Great feature.

At the end one more post – not technically database related, but it deals with the human side of databases and (DB) conferences: Mark Atwood's Journal entry about what to do when you're going to speak at a conference.

That would be all in this edition of Log Buffer. I hope you liked my choice of posts, and if not, don't worry, in a week there will be another Log Buffer, this time hosted by Jay Pipes