Deserialization of hstore data structure in Perl
You might be familiar with hstore datatype in PostgreSQL - if you’re not - check it out, as it is really cool.
Basically it’s hash in database, which you can search in.
This is how it looks:
# \d x
Table "public.x"
Column | Type | Modifiers
--------+--------+-----------
i | hstore |
# select * from x;
i
-----------------------------------------------------------
"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"
(1 row)
You can of course get individual parts out of it:
# select i->'lat' from x;
?column?
-----------
52.195593
(1 row)
The thing is - when I load it to perl, I get single field with stringified hstore as scalar variable:
=> perl -MDBI -MData::Dumper -e 'print Dumper(DBI->connect("dbi:Pg:dbname=depesz")->selectall_arrayref("select i from x"))'
$VAR1 = [
[
'"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"'
]
];
Since it’s hash, it should be possible to convert this stringified value to simple hash. Basically - you can of course parse it character-by-character, but there should be an easier way. Right? Right.
=> perl -MData::Dumper -e 'my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = {
'lat' => '52.195593',
'name' => 'Piastów',
'lng' => '20.837449'
};
That’s great. Isn’t it? Well. Yes. The thing is - what happens if your hstore contains “strange” vaues - like emails:
# select i || tconvert('email', 'depesz@depesz.com') from x;
?column?
-----------------------------------------------------------------------------------------
"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"
(1 row)
Let’s test the perl eval code:
=> perl -MData::Dumper -e 'my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = {
'email' => 'depesz.com',
'lat' => '52.195593',
'name' => 'Piastów',
'lng' => '20.837449'
};
What? it removed my username? Well, actually it removed “@depesz” - because it tried to eval it, and it doesn’t exist. The thing is, that when you turn on strict (which you should always do) - it generates other result:
=> perl -MData::Dumper -e 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = undef;
Whoa. Why did that happen? Let’s add “-w”:
=> perl -MData::Dumper -we 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
Possible unintended interpolation of @depesz in string at (eval 1) line 1.
$VAR1 = undef;
The reason why I write about it? To remind me. I just recenly made this mistake, and suffered for very interesting 2 days of debuging just to find out that I deserialized hstore in wrong way.
What is correct way then?
Well, Theodor Sigaev wrote in his presentation (slide 18) to use this code:
<code>$settings =~ s/([$@%])/\\$1/g;
my $hsettings = eval( “{$settings}” );</code>
Which basically works, but You have to prepend sigils ($, @ and %) with \:
<code>$settings =~ s/([\$\@\%])/\\$1/g;
my $hsettings = eval( “{$settings}” );</code>
After this change deserializtion of hstore works perfectly:
=> perl -MData::Dumper -we 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; $str =~ s/([\$\@\%])/\\$1/g; my $q = eval “{$str}”; print Dumper($q)’
$VAR1 = {
‘email’ => ‘depesz@depesz.com’,
‘lat’ => ‘52.195593′,
‘name’ => ‘Piastów’,
‘lng’ => ‘20.837449′
};
Of course - I would catch it faster if I did check eval errors. Which is another way of saying - code defensively - even if you’re sure that nothing can go wrong.