Waiting for PostgreSQL 14 – Add unistr function

On 29th of March 2021, Peter Eisentraut committed patch:

Add unistr function
 
This allows decoding a string with Unicode escape sequences.  It is
similar to Unicode escape strings, but offers some more flexibility.
 
Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com

For a very long time we had ability to use unicoide literals without using unicode characters.

For example – instead of writing:

=$ select 'żółw';

I could write:

=$ select U&'\017C\00F3\0142\0077';

Or even:

=$ select U&'\017C\00F3\0142w';

And even if I needed higher code points, like emoji, I could:

$ select U&'\+01F603';
 ?column? 
──────────
 😃
(1 row)

This is great. But now, we can use unistr function, which provides even more flexibility, and supports up to 8 hex digits in codepoint number.

Specifically unistr handles (X is single hexadecimal digit):

\XXXX (just like U&'….'
\+XXXXXX (just like U&'….'
\uXXXX
\UXXXXXXXX

Both new additions (\u and \U) seem to be the same as in Python (ans d possibly other languages), so it shouldn't be hard to grasp.

While I don't foresee now need for 32 bit (8 hex digits) codepoints, it's good to know I can take python string, and pass it through unistr to decode:

$ select unistr('I \U0001f60d PostgreSQL\u203C');
     unistr
─────────────────
 I 😍 PostgreSQL‼
(1 row)

Sweet, thanks to all involved 🙂

3 thoughts on “Waiting for PostgreSQL 14 – Add unistr function”

unistr has almost same possibility like string literal in Postgres. But unescaping of string literal is static – it just works only for string literal constant. With unistr it is dynamic – you can use it for any data.

Hi; I want to compile the postgres source code from github. how can I do this? is there any article about it?

@tnglm:

https://www.depesz.com/2019/05/15/how-to-play-with-upcoming-unreleased-postgresql/

Comments are closed.