Waiting for PostgreSQL 15 – SQL/JSON constructors

On 27th of March 2022, Andrew Dunstan committed patch:

SQL/JSON constructors
 
This patch introduces the SQL/JSON standard constructors for JSON:
 
JSON()
JSON_ARRAY()
JSON_ARRAYAGG()
JSON_OBJECT()
JSON_OBJECTAGG()
 
For the most part these functions provide facilities that mimic
existing json/jsonb functions. However, they also offer some useful
additional functionality. In addition to text input, the JSON() function
accepts bytea input, which it will decode and constuct a json value from.
The other functions provide useful options for handling duplicate keys
and null values.
 
This series of patches will be followed by a consolidated documentation
patch.
 
Nikita Glukhov
 
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
 
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru

A moment before Andrew Dunstan committed another patch that is the basis for this (and next) changes.

So. The idea is that SQL/JSON is standard related to handling json values in SQL databases.

We (PostgreSQL) had json/jsonb support for quite some time now (it was added in Pg 9.2, in 2012!). But now we will get more support for standardized ways to acces/create/handle these. Let's see what we can do now.

As far as I can see all the listed JSON* “functions" generate json values from some input.

I wrote “functions" and not functions, because these are not functions – psql's \df JSON* doesn't list them. What's more, I don't see any documentation (yet?), so all I can figure out is based on regression tests.

Let's see some examples:

=$ SELECT json();
ERROR:  syntax error at OR near ")"
LINE 1: SELECT json();
                    ^
=$ SELECT json(NULL);
 json 
------
 
(1 ROW)
 
=$ SELECT json(1);
ERROR:  cannot CAST TYPE INTEGER TO json
LINE 1: SELECT json(1);
                    ^
=$ SELECT json(1.1);
ERROR:  cannot CAST TYPE NUMERIC TO json
LINE 1: SELECT json(1.1);
                    ^
=$ SELECT json('depesz');
ERROR:  invalid INPUT syntax FOR TYPE json
DETAIL:  Token "depesz" IS invalid.
CONTEXT:  JSON DATA, line 1: depesz
 
=$ SELECT json('1');
 json 
------
 1
(1 ROW)
 
=$ SELECT json('"depesz"');
   json   
----------
 "depesz"
(1 ROW)
 
=$ SELECT json('[1,2,3]');
  json   
---------
 [1,2,3]
(1 ROW)

OK. So I can generally give json() “function" literal value that looks like json, and in return I will get value that is typed to json.

The interesting part is that I can use bytea values too:

=$ SELECT json('\x2264657065737a22'::bytea);
   json   
──────────
 "depesz"
(1 ROW)

At this moment, I can't say it's something spectacular. There are options in there too:

  • FORMAT JSON – in no test I found other value than JSON, so I have no idea what this is for.
  • ENCODING X – which seems to be useful for providing bytea values in different (from servers) encoding, but I tried this for ISO-8859-2 data, and failed, no value like iso8859-2, iso88592, or anything like this was accepted, and tests only mention utf* variants

But there is interesting thing that can be used for validation: WITH UNIQUE KEYS or WITHOUT UNIQUE KEYS. This is how it works:

=$ SELECT JSON( '{"a": 12, "b": 23}' WITH UNIQUE KEYS );
        json        
--------------------
 {"a": 12, "b": 23}
(1 ROW)
 
=$ SELECT JSON( '{"a": 12, "b": 23}' WITHOUT UNIQUE KEYS );
        json        
--------------------
 {"a": 12, "b": 23}
(1 ROW)
 
=$ SELECT JSON( '{"a": 12, "a": 23}' WITH UNIQUE KEYS );
ERROR:  duplicate JSON object KEY VALUE
 
=$ SELECT JSON( '{"a": 12, "a": 23}' WITHOUT UNIQUE KEYS );
        json        
--------------------
 {"a": 12, "a": 23}
(1 ROW)

So, if I'll pick JSON() constructor, and provide it with object data, that has non-unique keys, but I'll add “WITH UNIQUE KEYS", it will fail.

OK. Let's switch to the next one: JSON_ARRAY.

This is actually more interesting. We can start with simple:

=$ SELECT json_array(1,2,3);
 json_array 
------------
 [1, 2, 3]
(1 ROW)

but then we can:

=$ SELECT JSON_ARRAY( 1, '12', 'hubert', 'depesz' );
          json_array           
-------------------------------
 [1, "12", "hubert", "depesz"]
(1 ROW)

or even:

=$ SELECT json_array( 1, 2, json_array('hubert','depesz','lubaczewski'), json('{}') );
                   json_array                    
-------------------------------------------------
 [1, 2, ["hubert", "depesz", "lubaczewski"], {}]
(1 ROW)

and now, we can also make it return jsonb type:

=$ SELECT json_array( 1, 2, json_array('hubert','depesz','lubaczewski'), json('{}') returning jsonb) \gdesc
   COLUMN   | TYPE  
------------+-------
 json_array | jsonb
(1 ROW)

Interestingly you can put inside direct selects:

=$ SELECT json_array( SELECT datname FROM pg_database ORDER BY datname returning jsonb );
                        json_array                         
-----------------------------------------------------------
 ["depesz", "pgdba", "postgres", "template0", "template1"]
(1 ROW)

Similarly you can now also use json_arrayagg:

=$ SELECT json_arrayagg(datname ORDER BY datname) FROM pg_database;
                       json_arrayagg                       
-----------------------------------------------------------
 ["depesz", "pgdba", "postgres", "template0", "template1"]
(1 ROW)

Next thing is JSON_OBJECT(). This is also very interesting:

=$ SELECT JSON_OBJECT( 'a': 123, 'b': json_array(1,2,3));
         json_object          
------------------------------
 {"a" : 123, "b" : [1, 2, 3]}
(1 ROW)

Just like with JSON_ARRAY I can use RETURNING JSONB to get jsonb:

=$ SELECT JSON_OBJECT( relname: relkind returning JSONB) FROM pg_class LIMIT 1;
      json_object      
-----------------------
 {"pg_statistic": "r"}
(1 ROW)

and just like with JSON_ARRAY and it's aggregator, we have JSON_OBJECTAGG:

=$ SELECT JSON_OBJECTAGG( datname: oid) FROM pg_database;
                                          json_objectagg                                           
---------------------------------------------------------------------------------------------------
 { "postgres" : "5", "pgdba" : "16394", "template1" : "1", "template0" : "4", "depesz" : "16395" }
(1 ROW)

That looks great. Don't get me wrong: I still think that json in db is extremely often misused. And cause problems like “how do I get data from nested arrays in json fields" that are often asked on IRC, Slack, or Reddit.

But – json support is not going to disappear, and the new additions look great. I love how easy it is to make proper jsonb object based on key/value pairs. With proper quoting and handling of special cases.

Great work. Thanks to all involved.