Waiting for PostgreSQL 14 – Add unistr function

On 29th of March 2021, Peter Eisentraut committed patch:

Add unistr function
 
This allows decoding a string with Unicode escape sequences.  It is
similar to Unicode escape strings, but offers some more flexibility.
 
Author: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Asif Rehman <asifr.rehman@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com

For a very long time we had ability to use unicoide literals without using unicode characters.

For example – instead of writing:

=$ SELECT 'żółw';

I could write:

=$ SELECT U&'\017C\00F3\0142\0077';

Or even:

=$ SELECT U&'\017C\00F3\0142w';

And even if I needed higher code points, like emoji, I could:

$ SELECT U&'\+01F603';
 ?COLUMN? 
──────────
 😃
(1 ROW)

This is great. But now, we can use unistr function, which provides even more flexibility, and supports up to 8 hex digits in codepoint number.

Specifically unistr handles (X is single hexadecimal digit):

  • \XXXX (just like U&'….'
  • \+XXXXXX (just like U&'….'
  • \uXXXX
  • \UXXXXXXXX

Both new additions (\u and \U) seem to be the same as in Python (ans d possibly other languages), so it shouldn't be hard to grasp.

While I don't foresee now need for 32 bit (8 hex digits) codepoints, it's good to know I can take python string, and pass it through unistr to decode:

$ SELECT unistr('I \U0001f60d PostgreSQL\u203C');
     unistr
─────────────────
 I 😍 PostgreSQL‼
(1 ROW)

Sweet, thanks to all involved 🙂

3 thoughts on “Waiting for PostgreSQL 14 – Add unistr function”

  1. unistr has almost same possibility like string literal in Postgres. But unescaping of string literal is static – it just works only for string literal constant. With unistr it is dynamic – you can use it for any data.

  2. Hi; I want to compile the postgres source code from github. how can I do this? is there any article about it?

Comments are closed.