explain.depesz.com.

Long time ago I wrote small program to filter EXPLAIN ANALYZE output, and add summary of time.

A bit later (I guess, I don't recall exact time line, it could have been earlier) Michael Glaesemann started explain-analyze.info – cool tool for checking what might be wrong with given plan.

I'm not really happy with the emphasis Michael put on bad rowcount estimates, so I decided to write my own tool. Enter explain.depesz.com.

Basic idea is: paste your explain analyze plan, and see the output. You can click on column headers to let it know which parameter is the most important for you – exclusive node time, inclusive node time, or rowcount mis-estimate.

It is definitely not perfect. I know of at least 1 bug now, and will fix it in not-distant future.

But, as for now – you can test it, play it, or simply use it. If you'd like to change/fix something – sources are freely available. Just be warned – it's Perl ;-P

Hunting “idle in transactions”

If you ever encountered “idle in transaction" connections, you most likely hate them. I know, I personally hate them. They interfere with most of “cool toys" like replication, vacuum, DDL queries.

So, when I saw them on a database I was looking on, I decided to act.

Easier to say, difficult to do. How to fix the problem?

Continue reading Hunting “idle in transactions"

failing ls ?

i have this program, which forks-off worker processes, and then runs in them various tasks.

one of the tasks is to execute some command via system(), but since we need to get stdout and stderr (separately), we used ipc::run module.

simple example of such code would be:

#!/usr/bin/perl
use strict;
use Time::HiRes qw( usleep );
use IPC::Run qw( run );
use POSIX ":sys_wait_h";
sub REAPER {
    my $child;
    while (($child = waitpid(-1,WNOHANG)) > 0) {
    }
    $SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
for (1..100) {
    my $x = fork;
    die "cannot fork?!: $!\n" unless defined $x;
    if ($x) {
        usleep(10000);
        next;
    }
    my @cmd = qw(ls -lad .);
    my ($in, $out, $err);
    my $status = run \@cmd, \$in, \$out, \$err or die "ls: $?";
    printf ("%u\n", $status);
    exit;
}

what it does:

  • defines REAPER function, and sets it as sigchld handler – for details, please check perldoc perlipc. this is basically to avoid creation of zombie processes in case we have long-running parent process, which forks relatively short-lived child-processes.
  • forks off new process
  • after forking, master sleeps for 0.01 second (not to put to much pressure on testing system)
  • child process runs sample command (ls -ald .) via ipc::run, with empty stdin, and catching stdout and stderr.
  • child then exits
  • whole forking/ls-ald. thing is repeated 100 times to show that it's effect is not random.

what's wrong? here is output from it on my machine:

=> perl test.pl
ls: -1 at test.pl line 24.
...
...
...

which basically means ls failed – which is far from true, as this ls succeeds, simple check:

=> perl -e 'use IPC::Run qw(run);my @cmd = qw(ls -lad .);my ($in, $out, $err);my $status = run \@cmd, \$in, \$out, \$err or die "ls: $?";printf ("%u\n", $status);'
1

now, the riddle is: why it fails? (yes, i now know the answer, but it took me some time).

ed2k checksumming

i needed a way to generate ed2k urls based on existing files on my harddrive.

ed2k link looks like this:

ed2k://|file|FILENAME|FILESIZE|CHECKSUM|/

filename and filesize are of course known, but what about checksum? i tried to find some ready program to calculate them, but failed. it might be because i spent something like 3 minutes on it, but anyway – i didn't find it. so i tried to find algorithm description.

luckily there is a nice algorithms description. algorithms, as there apparently are two separate algorithms, not fully compatible with each other.

based on the information i was able to write a short perl script which does the job:

=> cat ed2ksum.pl
#!/usr/bin/perl -l
use Digest::MD4 qw(md4 md4_hex);open$f,pop or die$!;$c.=md4$b while sysread$f,$b,9728000;print uc md4_hex$c

yes, it is unreadable. but it works. first version was longer (about 15 lines), but then i decided to try to make it shorter. and shorter. and then even shorter. most probably it is not the shortest possible way, but i'm safisfied with it.

how does it work? simply:

=> ./ed2ksum.pl Slony-I-concept.pdf
E8715CD212CD75E0EE4B6C526D5BF36A

hope you'll find it useful.

finding optimum tables placement in 2-tablespace situation

just recently we got another array for out main production database. this means – we will be able to add new tablespace, thus making everything go faster.

in theory – it's nice. but which tables to move to the other?

the basic assumption is simple – index on table should not be on the same tablespace as the table itself. that's easy. but – should we really put all tables on one tablespace, and all indexes on another?

we decided that the important things that should be “boosted" are seeks and writes. sequential reads are (in our situation) more or less irrelevant.

read on to check how we split the load.

Continue reading finding optimum tables placement in 2-tablespace situation

what fields are usually changed when update’ing?

there was this situation, that we had a lot of tables and a lot of update activity. so, we thought about splitting the most updated tables to parts that are usually stable, and parts (columns) which change often.

but how to know what changes? unfortunately orm that was used issued updates like this:

UPDATE TABLE SET field1='..', field2='...', field3='...' WHERE id = 123;

basically it always updated all fields. (don't even start to comment that orms are by definition broken).

so, i had to find a nice way to find out what was really updated.

Continue reading what fields are usually changed when update'ing?

speeding up like ‘%xxx%’

as most of you know postgresql can easily speedup searches using:

field like 'something%'

and (less easily):

field like '%something'

but how about:

field like '%something%'

general idea is to use some kind of full text search/indexing – tsearch, lucene, sphinx, you name it.

but sometimes you can't install fts/fti, or it doesn't really solve your problem. is there any help? let's find out.

Continue reading speeding up like ‘%xxx%'

how many transactions per second?

i wanted to know how many transactions per second is my machine processing.

how to do so? a simple select to pg_stat_database will do the job (actually 2 selects 🙂

but since i have to write it anyway, perhaps i can/should make it so it will print the current value continuously?

and, while i'm at it, some kind of graph wouldn't be bad 🙂

Continue reading how many transactions per second?