Waiting for 9.3 – Add new JSON processing functions and parser API.

On 29th of March, Andrew Dunstan committed patch:

Add new JSON processing functions and parser API.
 
The JSON parser is converted into a recursive descent parser, and
exposed for use by other modules such as extensions. The API provides
hooks for all the significant parser event such as the beginning and end
of objects and arrays, and providing functions to handle these hooks
allows for fairly simple construction of a wide variety of JSON
processing functions. A set of new basic processing functions and
operators is also added, which use this API, including operations to
extract array elements, object fields, get the length of arrays and the
set of keys of a field, deconstruct an object into a set of key/value
pairs, and create records from JSON objects and arrays of objects.
 
Catalog version bumped.
 
Andrew Dunstan, with some documentation assistance from Merlin Moncure.

This is huge change, and one that makes me very happy. So far we were able to create JSONs, and store them. But reading of data from JSON was not really possible.

Now – thanks to Andrew's work – we got huge number of additional features that can be used to actually use the data from within the jsons.

First, let's talk about new operators. There are 2 new operators, each with 2 variants.

First operator is -> – this is used to extract data from json value. Returned value is also json, so you can “chain" it:

$ SELECT '{"a":[1,2,3]}'::json->'a'->2;
 ?COLUMN?
----------
 3
(1 ROW)

This operator can take either integer or text on the right side. If it's integer – it assumes the left side of operator is json array, and returns element that has given index (0 based).

If the right side is text, it is used to get data from JSON objects (hashes).

So, for example, If I have JSON object:

{ "name": "depesz", "password": "super simple", "access_level": 100 }

And I'd want to get name from it, I could:

$ SELECT '{ "name": "depesz", "password": "super simple", "access_level": 100 }'::json->'name';
 ?COLUMN? 
----------
 "depesz"
(1 ROW)

Array approach works just as fine:

$ SELECT '["hubert", "depesz", "lubaczewski"]'::json->1;
 ?COLUMN?
----------
 "depesz"
(1 ROW)

Please note that the return values are within quotes – this is because these are not texts, but rather json values. If we'd want text, it can be done using ->> version of the operator, like:

$ SELECT '{ "name": "depesz", "password": "super simple", "access_level": 100 }'::json->>'name';
 ?COLUMN?
----------
 depesz
(1 ROW)
 
$ SELECT '["hubert", "depesz", "lubaczewski"]'::json->>1;
 ?COLUMN?
----------
 depesz
(1 ROW)

Second operator (technically, I described above 4 operators, but they are very similar) is #>

This operator can be used instead of chaining multiple ->..->..->..->… #> right side is text array, So if we'd take a look at the example earlier, with:

SELECT '{"a":[1,2,3]}'::json->'a'->2;

It can be also written as:

SELECT '{"a":[1,2,3]}'::json #> ARRAY[ 'a', '2' ];
 ?COLUMN?
----------
 3
(1 ROW)

(of course you can use different way of providing the array, like ‘{a,2}'::text[].

New functions, added by this patch are:

json_array_elements

Returns set of json values from an array. For example:

$ SELECT json_array_elements('[1,2,[4,5],{"a":"b"},4]');
 json_array_elements
---------------------
 1
 2
 [4,5]
 {"a":"b"}
 4
(5 ROWS)

json_array_length

Returns number of elements in given array. If the array has subarrays, these are not traversed, so:

$ SELECT json_array_length('[1,2,[4,5],{"a":"b"},4]');
 json_array_length
-------------------
                 5
(1 ROW)

json_each

Returns set of records of key/value (key – text, value – json) which represent JSON object:

$ SELECT * FROM json_each('{"name":"depesz","password":"super simple","grades":[1,3,1,1,1,2],"skills":{"a":"b", "c":[1,2,3]}}');
   KEY    |         VALUE
----------+------------------------
 name     | "depesz"
 password | "super simple"
 grades   | [1,3,1,1,1,2]
 skills   | {"a":"b", "c":[1,2,3]}
(4 ROWS)

json_each_text

Just like json_each() above, but value is text, so for scalar values (not arrays and not objects), quotes are removed.

$ SELECT * FROM json_each_text('{"name":"depesz","password":"super simple","grades":[1,3,1,1,1,2],"skills":{"a":"b", "c":[1,2,3]}}');
   KEY    |         VALUE
----------+------------------------
 name     | depesz
 password | super simple
 grades   | [1,3,1,1,1,2]
 skills   | {"a":"b", "c":[1,2,3]}
(4 ROWS)

json_extract_path

Function that does just what #> operator – extracts value based on path:

$ SELECT json_extract_path(
    '{"name":"depesz","password":"super simple","grades":[1,3,1,1,1,2],"skills":{"a":"b", "c":[1,2,3]}}',
    'skills', 'c', '2'
);
 json_extract_path
-------------------
 3
(1 ROW)

json_extract_path_text

Same as above, but returns as text – so for scalar values quotations are removed. Since it seems obvious, I'm skipping example.

json_object_keys

Returns set of text values, where each value is key in an object:

$ SELECT * FROM json_object_keys(
    '{"name":"depesz","password":"super simple","grades":[1,3,1,1,1,2],"skills":{"a":"b", "c":[1,2,3]}}'
);
 json_object_keys
------------------
 name
 password
 grades
 skills
(4 ROWS)

Two last functions: json_populate_record and json_populate_recordset, are a bit more complicated, so I'll cover them both in a bit more text.

If you have a JSON that is representation of a row from a table. Or, at least – some columns from such table. For example, let's consider such table:

$ \d test
     TABLE "public.test"
 COLUMN |  TYPE   | Modifiers
--------+---------+-----------
 i      | INTEGER |
 t      | text    |
 b      | BOOLEAN |

And let's say we have such json:

[
  { "i":1, "t": "whatever", "b": true },
  { "i":2, "t": "anything" },
  { "i":3, "b": false }
]

Then, you can use json_populate_recordset() function (which takes json, which is supposed to be array of objects) to return recordset that has columns named and typed like test table.

How does it work? For test, I stored my JSON (as shown above) in single-row table “t", and the column is named j. So now I can:

$ SELECT * FROM json_populate_recordset( NULL::test, ( SELECT j FROM t ) );
 i |    t     |   b
---+----------+--------
 1 | whatever | t
 2 | anything | [NULL]
 3 | [NULL]   | f
(3 ROWS)

First argument to the function – NULL::test, is any value typed to whatever you want to get from the function. Since each table is also a type, and simplest value is null, you generally want NULL::table_name.

Second argument is the JSON.

We can also verify datatypes by doing:

$ CREATE TABLE test_2 AS SELECT * FROM json_populate_recordset( NULL::test, ( SELECT j FROM t ) );
SELECT 3
 
$ \d test_2
    TABLE "public.test_2"
 COLUMN |  TYPE   | Modifiers
--------+---------+-----------
 i      | INTEGER |
 t      | text    |
 b      | BOOLEAN |

As you can see the datatypes are correct.

If value for any field is not provided in source data – it will be returned as NULL.

Difference between json_populate_record() and json_populate_recordset() is that json_populate_record expects to get single object, and returns single row, while the json_populate_recordset works on array of objects and returns (potentially) multiple rows.

I, for am, am grateful to Andrew for his work, as now JSON seems to be really useful for DB programming. So far it was more or less only for sending data back to application, but now we can use JSON to do many cool things ourselves.

The only thing missing is modifying existing JSONs. Like – adding key to object, or adding value to array, or changing existing value within array/object to something else. It can be done now, but in a not-so-pleasant way.

11 thoughts on “Waiting for 9.3 – Add new JSON processing functions and parser API.”

  1. Thanks for your comments.

    One thing about the new API is that it makes writing functions like json_set() a heck of a lot easier. I could probably undertake to write something along these line in a fairly short time, although it does have some wrinkles (e.g. what do you do if the arguments say to set element 5 of a non-existent array? Set items 0 .. 4 to null?)

    It’s certainly worth doing, though.

  2. @Andrew:
    I would say so. Alternatively – make array_push(), array_unshift() functions, and make the one that changes element “inside” – work only for existing elements.

  3. This will be really a huge change, thanks for implementing it and writing about! One question popped into my mind: how about indexing arbitrary JSONs? Is there anything like that? (Would be a sort of NoSQL in a yeSQL column 🙂

  4. @Dezso:
    As I understand – currently it’s not indexable.
    Also, I don’t think you can create index on them, because extraction functions are stable, and not immutable. Not sure why, though. Andrew?

  5. Perhaps the extraction functions should be immutable. Bring it up on the -hackers list. But that’s not quite the same as indexing arbitrary json, i.e. indexing on unknown keys and values.

  6. @Andrew:
    Sure, it’s not the same. But at the very least it would make it possible to create functional indexes.

  7. My thought was that just as hstore can be indexed, probably a flat JSON could be indexed, too. I see for achieving this, one should be able to determine if a JSON is flat, though. I have no idea for non-flat sructures…

  8. @Dezso:

    yeah, I figured. Given current limitations for both datatypes, I would say that they should be used in different scenarios:

    if you want indexable searches, and you’re fine with “no-nesting” limitation: use hstore.
    if you need nested, complicated structures, and don’t care about indexing – use json.
    and if you need the best from both world – you’ll have to wait to see what, and when, the future will bring.

  9. Bon je n’ai pas eu l’occasion de terminer de regarder cependant je repasserai dans la semaine

Comments are closed.