August 31st, 2007 by depesz | Tags: , , , , | 3 comments »
Did it help? If yes - maybe you can help me?

faber4 on irc asked about how to get ascii-based sorting, while his postgresql was initdb'ed with utf-8 based locale (en_US.UTF-8 to be exact).

what can we do about it?

first, let's check if my environment is ok:

# show lc_collate;
lc_collate
-------------
pl_PL.UTF-8
(1 row)

looks file. pl_PL is not en_US, but it's utf, so no ascii-based sorting should be done.

now, let's create some test data:

# create table test (x text);
CREATE TABLE
# copy test from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> aaa
>> ab
>> 1ab
>> ab
>> \.

ok, so let's do the test:

# select * from test order by x;
x
-----
1ab
aaa
ab
ab
(4 rows)CREATE OR REPLACE FUNCTION ascii_recode(TEXT) RETURNS TEXT AS $BODY$
my $input = shift;
$input =~ s/(.)/sprintf("%02x", ord $1)/ges;
$input =~ y/0-9a-f/a-p/;
return $input;
$BODY$ LANGUAGE plperl immutable;

now, let's test it:

# select ascii_recode('i love postgresql');
ascii_recode
------------------------------------
gjcagmgphggfcahagphdheghhcgfhdhbgm
(1 row)

looks good to me. how does it sort?

# select * from test order by ascii_recode(x);
x
-----
ab
1ab
aaa
ab
(4 rows)

nice. it works.

what's important – this function is “immutable". which means we can make index on it to use it for sorting larger datasets.

of course – it is not perfect. it will have issues when dealing with multi-byte characters in recoded-strings. but – it's a trick. use with caution 🙂

  1. 3 comments

  2. # spoksss
    Jul 29, 2010

    select * from test order by x using ~<~

  3. Aug 3, 2010

    @Spoksss:
    Nice. I can’t find it anywhere in docs, though.

  4. # dedi
    Sep 17, 2011

    thx for your hint depesz. useful.

    in 8.1/8.3 postgresql version, may be possible to do this:

    select * from test order by ascii(substr(test,1,1));

    cmiiw.

Leave a comment