Deserialization of hstore data structure in Perl

You might be familiar with hstore datatype in PostgreSQL – if you're not – check it out, as it is really cool.

Basically it's hash in database, which you can search in.

This is how it looks:

# \d x
      TABLE "public.x"
 COLUMN |  TYPE  | Modifiers
--------+--------+-----------
 i      | hstore |
 
# SELECT * FROM x;
                             i
-----------------------------------------------------------
 "lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"
(1 ROW)

You can of course get individual parts out of it:

# SELECT i->'lat' FROM x;
 ?COLUMN?
-----------
 52.195593
(1 ROW)

The thing is – when I load it to perl, I get single field with stringified hstore as scalar variable:

=> perl -MDBI -MData::Dumper -e 'print Dumper(DBI->connect("dbi:Pg:dbname=depesz")->selectall_arrayref("select i from x"))'
$VAR1 = [
          [
            '"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"'
          ]
        ];

Since it's hash, it should be possible to convert this stringified value to simple hash. Basically – you can of course parse it character-by-character, but there should be an easier way. Right? Right.

=> perl -MData::Dumper -e 'my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = {
          'lat' => '52.195593',
          'name' => 'Piastów',
          'lng' => '20.837449'
        };

That's great. Isn't it? Well. Yes. The thing is – what happens if your hstore contains “strange" vaues – like emails:

# SELECT i || tconvert('email', 'depesz@depesz.com') FROM x;
                                        ?COLUMN?
-----------------------------------------------------------------------------------------
 "lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"
(1 ROW)

Let's test the perl eval code:

=> perl -MData::Dumper -e 'my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = {
          'email' => 'depesz.com',
          'lat' => '52.195593',
          'name' => 'Piastów',
          'lng' => '20.837449'
        };

What? it removed my username? Well, actually it removed “@depesz" – because it tried to eval it, and it doesn't exist. The thing is, that when you turn on strict (which you should always do) – it generates other result:

=> perl -MData::Dumper -e 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = undef;

Whoa. Why did that happen? Let's add “-w":

=> perl -MData::Dumper -we 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; my $q = eval "{$str}"; print Dumper($q)'
Possible unintended interpolation of @depesz in string at (eval 1) line 1.
$VAR1 = undef;

The reason why I write about it? To remind me. I just recenly made this mistake, and suffered for very interesting 2 days of debuging just to find out that I deserialized hstore in wrong way.

What is correct way then?

Well, Theodor Sigaev wrote in his presentation (slide 18) to use this code:

<code>$settings =~ s/([$@%])/\\$1/g;
my $hsettings = eval( "{$settings}" );</code>

Which basically works, but You have to prepend sigils ($, @ and %) with \:

<code>$settings =~ s/([\$\@\%])/\\$1/g;
my $hsettings = eval( "{$settings}" );</code>

After this change deserializtion of hstore works perfectly:

=> perl -MData::Dumper -we 'use strict; my $str = q{"lat"=>"52.195593", "lng"=>"20.837449", "name"=>"Piastów", "email"=>"depesz@depesz.com"}; $str =~ s/([\$\@\%])/\\$1/g; my $q = eval "{$str}"; print Dumper($q)'
$VAR1 = {
          'email' => 'depesz@depesz.com',
          'lat' => '52.195593',
          'name' => 'Piastów',
          'lng' => '20.837449'
        };

Of course – I would catch it faster if I did check eval errors. Which is another way of saying – code defensively – even if you're sure that nothing can go wrong.

2 thoughts on “Deserialization of hstore data structure in Perl”

  1. Hey,

    I noticed you have a few blog entries related to the hstore data type in PostgreSQL. Are you using this data type in a production environment? Have you run into any gotchas with this data type?

    Thanks,
    Dan

  2. @Dan:
    Yes, I use it. As for gotchas – not really. I just saw one case when the type was abused – values from hstore were used for searching and ordering in quite busy website database, and this lead to performance issues. But as long as you’re using it sensibly, everything works fine.

Comments are closed.