Posts Tagged ‘perl’

failing ls ?

2008-02-07 22:58:00 CET | 1 Comment | Tags: , ,

i have this program, which forks-off worker processes, and then runs in them various tasks.

one of the tasks is to execute some command via system(), but since we need to get stdout and stderr (separately), we used ipc::run module.

simple example of such code would be:

#!/usr/bin/perl
use strict;
use Time::HiRes qw( usleep );
use IPC::Run qw( run );
use POSIX ":sys_wait_h";
sub REAPER {
my $child;
while (($child = waitpid(-1,WNOHANG)) > 0) {
}
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
for (1..100) {
my $x = fork;
die "cannot fork?!: $!\n" unless defined $x;
if ($x) {
usleep(10000);
next;
}
my @cmd = qw(ls -lad .);
my ($in, $out, $err);
my $status = run \@cmd, \$in, \$out, \$err or die "ls: $?";
printf ("%u\n", $status);
exit;
}

what it does:

  • defines REAPER function, and sets it as sigchld handler - for details, please check perldoc perlipc. this is basically to avoid creation of zombie processes in case we have long-running parent process, which forks relatively short-lived child-processes.
  • forks off new process
  • after forking, master sleeps for 0.01 second (not to put to much pressure on testing system)
  • child process runs sample command (ls -ald .) via ipc::run, with empty stdin, and catching stdout and stderr.
  • child then exits
  • whole forking/ls-ald. thing is repeated 100 times to show that it’s effect is not random.

what’s wrong? here is output from it on my machine:

=> perl test.pl
ls: -1 at test.pl line 24.
...
...
...

which basically means ls failed - which is far from true, as this ls succeeds, simple check:

=> perl -e 'use IPC::Run qw(run);my @cmd = qw(ls -lad .);my ($in, $out, $err);my $status = run \@cmd, \$in, \$out, \$err or die "ls: $?";printf ("%u\n", $status);'
1

now, the riddle is: why it fails? (yes, i now know the answer, but it took me some time).

ed2k checksumming

2008-01-13 20:29:58 CET | 5 Comments | Tags: ,

i needed a way to generate ed2k urls based on existing files on my harddrive.

ed2k link looks like this:

ed2k://|file|FILENAME|FILESIZE|CHECKSUM|/

filename and filesize are of course known, but what about checksum? i tried to find some ready program to calculate them, but failed. it might be because i spent something like 3 minutes on it, but anyway - i didn’t find it. so i tried to find algorithm description.

luckily there is a nice algorithms description. algorithms, as there apparently are two separate algorithms, not fully compatible with each other.

based on the information i was able to write a short perl script which does the job:

=> cat ed2ksum.pl
#!/usr/bin/perl -l
use Digest::MD4 qw(md4 md4_hex);open$f,pop or die$!;$c.=md4$b while sysread$f,$b,9728000;print uc md4_hex$c

yes, it is unreadable. but it works. first version was longer (about 15 lines), but then i decided to try to make it shorter. and shorter. and then even shorter. most probably it is not the shortest possible way, but i’m safisfied with it.

how does it work? simply:

=> ./ed2ksum.pl Slony-I-concept.pdf
E8715CD212CD75E0EE4B6C526D5BF36A

hope you’ll find it useful.

finding optimum tables placement in 2-tablespace situation

2007-09-30 20:53:20 CEST | 3 Comments | Tags: , , , ,

just recently we got another array for out main production database. this means - we will be able to add new tablespace, thus making everything go faster.

in theory - it’s nice. but which tables to move to the other?

the basic assumption is simple - index on table should not be on the same tablespace as the table itself. that’s easy. but - should we really put all tables on one tablespace, and all indexes on another?

we decided that the important things that should be “boosted” are seeks and writes. sequential reads are (in our situation) more or less irrelevant.

read on to check how we split the load.

- MORE -

what fields are usually changed when update’ing?

2007-09-21 23:40:34 CEST | 6 Comments | Tags: , ,

there was this situation, that we had a lot of tables and a lot of update activity. so, we thought about splitting the most updated tables to parts that are usually stable, and parts (columns) which change often.

but how to know what changes? unfortunately orm that was used issued updates like this:

update table set field1='..', field2='...', field3='...' where id = 123;

basically it always updated all fields. (don’t even start to comment that orms are by definition broken).

so, i had to find a nice way to find out what was really updated.

- MORE -

speeding up like ‘%xxx%’

2007-09-15 21:42:21 CEST | 11 Comments | Tags: , , ,

as most of you know postgresql can easily speedup searches using:

field like 'something%'

and (less easily):

field like '%something'

but how about:

field like '%something%'

general idea is to use some kind of full text search/indexing - tsearch, lucene, sphinx, you name it.

but sometimes you can’t install fts/fti, or it doesn’t really solve your problem. is there any help? let’s find out.

- MORE -

how many transactions per second?

2007-09-04 13:53:19 CEST | No Comments | Tags: , , ,

i wanted to know how many transactions per second is my machine processing.

how to do so? a simple select to pg_stat_database will do the job (actually 2 selects :)

but since i have to write it anyway, perhaps i can/should make it so it will print the current value continuously?

and, while i’m at it, some kind of graph wouldn’t be bad :)

- MORE -

postgresql tips & tricks

2007-08-31 11:19:18 CEST | No Comments | Tags: , , ,

faber4 on irc asked about how to get ascii-based sorting, while his postgresql was initdb’ed with utf-8 based locale (en_US.UTF-8 to be exact).

what can we do about it?

- MORE -

effective finding queries to optimize

2007-08-24 15:57:36 CEST | 5 Comments | Tags: , , , ,

let’s imagine simple situation - you have postgresql server. configuration was fine-tuned, hardware is ok. yet the system is not really as fast as it should.

most common problem - slow queries.

second most common problem - fast queries, but too many of them. for example - i once saw a system which did something like this:

  • select id from table;
  • for every id do:
  • select * from table where id = ?

reason? very “interesting” orm.

now i’ll show you how i deal with these kind of situations :)

- MORE -

indexable ” field like ‘%something’”

2007-07-30 15:49:44 CEST | 6 Comments | Tags: , , ,

for the long time everybody knew that you can’t use index on “LIKE” operations.

then came text_pattern_ops, so we could use indexes for prefix searches:

# \d depesz_test
Table "public.depesz_test"
Column | Type | Modifiers
--------+---------+----------------------------------------------------------
id | integer | not null default nextval('depesz_test_id_seq'::regclass)
email | text | not null
Indexes:
"depesz_test_pkey" PRIMARY KEY, btree (id)
"x" UNIQUE, btree (email text_pattern_ops)

# explain analyze select count(*) from depesz_test where email like 'dep%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=96.71..96.72 rows=1 width=0) (actual time=0.983..0.985 rows=1 loops=1)
-> Bitmap Heap Scan on depesz_test (cost=4.68..96.65 rows=24 width=0) (actual time=0.184..0.641 rows=155 loops=1)
Filter: (email ~~ 'dep%'::text)
-> Bitmap Index Scan on x (cost=0.00..4.67 rows=24 width=0) (actual time=0.158..0.158 rows=155 loops=1)
Index Cond: ((email ~>=~ 'dep'::text) AND (email ~<~ 'deq'::text))
Total runtime: 1.067 ms
(6 rows)

but what if i’d like to search for ‘%something’? not prefix, but suffix. in my example - what can i do to use indexes when searching for people from given domain?

- MORE -

print “read manual” while each %bug;

2007-06-14 21:16:05 CEST | 2 Comments | Tags: ,

taak.

trafił mnie dziś bardzo “fajny” błąd.

w sofcie który pisałem, piszę i się zajmuję mam taki kawałek kodu:

for my $object ( @objects ) {
next unless $self->validate_object( $object );
$self->save_object_to_database( $object );
}

kod ten pobiera z podanej listy obiekty (tak naprawdę to nie obiekty tylko struktury (hasze haszy). potem waliduje zawartość i jeśli walidacja się udała - wpisuje do bazy.

trywiał.

jednym z elementów walidacji jest podmiana pewnych wartości na wartości słownikowe.

powiedzmy, że mamy w “obiekcie” element “region” i ma on wartość “WARSZAWA”. w odpowiednim słowniku sprawdzam czy wartość WARSZAWA jest dopuszczalna. jak nie - walidacja się nie udała. jak tak, zamiast stringu “WARSZAWA” wstawiam numeryczny identyfikator z bazy. np. 15.

kod który to robi:

sub validate_object {
my $self = shift;
my $object = shift;
while (my ($param, $dictionary) = each %{ $self->validation_rules }) {
next unless defined $object->{ $param };
my $object_value = $object->{ $param };
if ( $self->dictionaries->{ $dictionary }->{ $object_value } ) {
$object->{ $param } = $self->dictionaries->{ $dictionary }->{ $object_value };
next;
}
$self->log("error at validation ...");
return;
}
return 1;
}

może nie jest to najbardziej czytelne, ale po kolei:

hash $self->validation_rules ma pary klucz/wartość, gdzie klucz jest nazwą klucza (elementu) z obiektu, a wartość jest nazwą słownika którym mamy dany element walidować.

przykładowo hash ten może zawierać:

"region" => 'REGION_LIST'

reguł jest standardowo około 10.

słowniki są zwracane z metody $self->dictionaries(). zwracana struktura to hashref, mający jako klucz nazwę słownika, a jako wartość - hashref z parami - tekstowa wartość => numeryczny identyfikator.

przykładowo:

{
'REGION_LIST' => { 'WARSZAWA' => 1, 'KRAKÓW' => 2, ...},
'CATEGORIES_LIST' => { 'MOTO' => 15, 'AGD' => 21, ... },
...
}

proste.

czy widzicie błąd w kodzie funkcji validate_object() ?

nie?

podpowiem. objaw który do mnie trafił, to to, że metoda save_object_to_database() zwracała bład sql’a, który mówił, że wartość ‘WARSZAWA’ nie jest prawidłowa dla pola numerycznego.

nadal nie wiecie?

otóż metoda each(). zapamiętuje ona w haszu ostatnio zwrócony element. tak aby przy następnym wywołaniu zwrócić kolejny.

co się więc stanie gdy któryś z obiektów sie nie zwaliduje?

załóżmy, że mamy 10 reguł. od 1 do 10. przy obiekcie “a” zwalidowały się reguły 1, 2, 3, 4, a przy regule 5 pojawił się błąd. został zalogowany ($self->log), metoda validate_object się skończyła pustym returnem. więc w głównej metodzie został pobrany kolejny obiekt - “b”.

przy walidowaniu obiektu “b”, wywołujemy each(), który zwraca którą regułę? 6! potem 7, 8, 9, 10 i na tym skończy. czyli reguły 1-5 w ogóle nie są sprawdzone. i nawet jeśli obiekt jest poprawny - wartości odpowiednich pól nie zostają zamienione na id’y. i stąd błąd przy insercie.

czemu o tym piszę?

dwa powody.

po pierwsze: może się to komuś przyda.

po drugie: może to spowoduje, że o tym nie zapomnę. i następnym razem zamiast bawić się each() użyję po prostu keys.