text to hstore migration
in postgresql 8.2, in contrib, is great new datatype called hstore.
if you’re not familiar with it - check the docs.
in short - this is indexable associative-array (hash) in one field.
i was given a task to convert some text field to hstore field.
text is very simple, contains set of “words”, separated by spaces, where each “word” contains:
- textual identifier, containing only lower case letters
- “:” - separator
- numerical (0 or positive integer) value
example value:
abc:123 def:1 bg:3 foo:99
luckily, the format is really simple.
now, i have to convert it to hstore column. adding a column is a no-brainer, but converting is apparently not so simple.
first, i tried making “replace” calls to change format of data to format acceptable for hstore:
# select replace(replace('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>');
replace
----------------------------------
abc=>123, def=>1, bg=>3, foo=>99
(1 row)
but, while this works:
# select 'abc=>123, def=>1, bg=>3, foo=>99'::hstore;
hstore
--------------------------------------------------
"bg"=>"3", "abc"=>"123", "def"=>"1", "foo"=>"99"
(1 row)
this doesn’t:
# select replace(replace('abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore;
ERROR: cannot cast type text to hstore
LINE 1: ...'abc:123 def:1 bg:3 foo:99', ' ', ', '), ':', '=>')::hstore;
oops. got to find better way. luckily, it’s not very hard:
text2hstore converting functions seems to be quite simple:
CREATE OR REPLACE FUNCTION text_to_hstore(input TEXT) RETURNS hstore as $BODY$
DECLARE
input_parts TEXT[];
output hstore := ”;
item TEXT[];
i int;
BEGIN
IF input IS NULL OR input = ” THEN
RETURN output;
END IF;
IF NOT input ~ ‘^[a-z]+:[0-9]( [a-z]+:[0-9]+)*$’ THEN
raise exception ‘input doesn”t match validating regexp [%]‘, input;
END IF;
input_parts := string_to_array(input, ‘ ‘);
for i in array_lower( input_parts, 1 ) .. array_upper( input_parts, 1 ) loop
item := string_to_array( input_parts[i], ‘:’ );
output := output || tconvert( item[1], item[2] );
END loop;
RETURN output;
END;
$BODY$ language plpgsql;
what it does is:
- if given string (input) is null, or is empty string - return empty hstore
- if given string doesn’t match validation regexp - raise exception (i want to be sure the conversion is ok)
- split input string to array of strings, where each item in array is “word” (code:value)
- iterate over array of strings, for each item do:
- split item into code and value (stored as item array)
- add new key to output hstore
- after finishing all “words”, return generated hstore.
so, how fast does it work?
i did test on some random test table, with following dataset:
# select count(*), count(text_field), min(length(text_field)), max(length(text_field)), sum(length(text_field)) from test ;
count | count | min | max | sum
---------+---------+-----+-----+----------
1502922 | 1349308 | 0 | 238 | 11262277
(1 row)
creating new table with hstore value took:
# create table t2 as select text_to_hstore(text_field) from test ;
SELECT
Time: 47898.277 ms
not bad. of course the function is very task-specific, but perhaps somebody will use it for similar case.
December 9th, 2007 at 13:39
Depesz, most of the types in postgresql have 2 functions for representation. you can usually use one of those (for example the representation is used for psql to display data struct). So instead of using a custom func you could use:
test=# select hstore_in(’hello=>depesz’);
hstore_in
——————
“hello
“=>”depesz”
(1 row)
December 9th, 2007 at 13:40
and the hstore_out of course does again convert the hstore type to the text presentation.
December 9th, 2007 at 14:36
it doesn’t really do *text* to hstore conversion:
# select hstore_in(’a=>b’::text);
ERROR: function hstore_in(text) does not exist
LINE 1: select hstore_in(’a=>b’::text);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
it works only with un-typed values.