On 29th of March 2021, Peter Eisentraut committed patch:
Add unistr function This allows decoding a string with Unicode escape sequences. It is similar to Unicode escape strings, but offers some more flexibility. Author: Pavel Stehule <email@example.com> Reviewed-by: Asif Rehman <firstname.lastname@example.org> Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA5GnKT+gDVwbVRH2ep451H_myBt+NTz8RkYUARE9+qOQ@mail.gmail.com
For a very long time we had ability to use unicoide literals without using unicode characters.
For example – instead of writing:
=$ SELECT 'żółw';
I could write:
=$ SELECT U&'\017C\00F3\0142\0077';
=$ SELECT U&'\017C\00F3\0142w';
And even if I needed higher code points, like emoji, I could:
$ SELECT U&'\+01F603'; ?COLUMN? ────────── 😃 (1 ROW)
This is great. But now, we can use unistr function, which provides even more flexibility, and supports up to 8 hex digits in codepoint number.
Specifically unistr handles (X is single hexadecimal digit):
- \XXXX (just like U&'….'
- \+XXXXXX (just like U&'….'
Both new additions (\u and \U) seem to be the same as in Python (ans d possibly other languages), so it shouldn't be hard to grasp.
While I don't foresee now need for 32 bit (8 hex digits) codepoints, it's good to know I can take python string, and pass it through unistr to decode:
$ SELECT unistr('I \U0001f60d PostgreSQL\u203C'); unistr ───────────────── I 😍 PostgreSQL‼ (1 ROW)
Sweet, thanks to all involved 🙂