Waiting for 9.2 – JSON

On 31st of January, Robert Haas committed patch:

Like the XML data type, we simply store JSON data as text, after checking
that it is valid.  More complex operations such as canonicalization and
comparison may come later, but this is enough for not.
 
There are a few open issues here, such as whether we should attempt to
detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets
the basic framework in place.

and then, 3 days later, Andrew Dunstan committed another one, related:

Also move the escape_json function from explain.c to json.c where it
seems to belong.
 
Andrew Dunstan, Reviewd by Abhijit Menon-Sen.

Thanks to these two patches, we now have basic support for JSON database.

So far it doesn't mean much, aside from the fact that when dealing with json values (for example serialization of data from some external language) you can choose “json" instead of “text" for column datatype, which means that the value will be checked to be valid.

Let's see how that works:

$ CREATE TABLE test (x json);
CREATE TABLE
 
$ \d test
    TABLE "public.test"
 COLUMNTYPE │ Modifiers
────────┼──────┼───────────
 x      │ json │

so far, no surprises. Let's insert some data there – both sensible and not:

$ INSERT INTO test (x) VALUES ('{"a":123}');
INSERT 0 1
 
$ INSERT INTO test (x) VALUES ('{"b" : 123}');
INSERT 0 1
 
$ INSERT INTO test (x) VALUES ('This is not correct JSON.');
ERROR:  invalid INPUT syntax FOR TYPE json
LINE 1: INSERT INTO test (x) VALUES ('This is not correct JSON.');
                                     ^
DETAIL:  line 1: Token "This" IS invalid.
 
$ SELECT * FROM test;
      x
─────────────
 {"a":123}
 {"b" : 123}
(2 ROWS)

This is pretty cool. What we see – validation works. Value that was obviously not a correct JSON got rejected. We also see that Pg preserves inner spaces in the json values, which can be good for readability.

All above functionality came from first commit.

Second added two functions:

  • array_to_json
  • row_to_json

Their names seem to be pretty straightforward, so let's just quickly see how they work:

$ SELECT array_to_json( '{ab,cd,ef}'::text[] );
  array_to_json
──────────────────
 ["ab","cd","ef"]
(1 ROW)

and on more complex values:

$ SELECT array_to_json( array( SELECT z FROM (SELECT i, i*2 AS j FROM generate_series(1,4) i ) AS z ) );
                       array_to_json                       
───────────────────────────────────────────────────────────
 [{"i":1,"j":2},{"i":2,"j":4},{"i":3,"j":6},{"i":4,"j":8}]
(1 ROW)

clearly it lost column names, and treated whole rows as simple scalar values. It works, but could have been better.

The 2nd functions – row_to_json works like this:

$ SELECT row_to_json( d ) FROM pg_depend d LIMIT 2;
                                                   row_to_json
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 {"classid":NULL,"objid":NULL,"objsubid":NULL,"refclassid":1259,"refobjid":1247,"refobjsubid":NULL,"deptype":"p"}
 {"classid":NULL,"objid":NULL,"objsubid":NULL,"refclassid":1259,"refobjid":1249,"refobjsubid":NULL,"deptype":"p"}
(2 ROWS)

Arrays in the rows are handled correctly:

$ SELECT *, row_to_json(r) FROM t AS r;
 i │ t │    a    │         row_to_json
───┼───┼─────────┼─────────────────────────────
 1 │ a │ {1,2,3}{"i":1,"t":"a","a":[1,2,3]}
 2 │ b │ {2,3,4}{"i":2,"t":"b","a":[2,3,4]}
(2 ROWS)

On the other hand – more complex rows are somewhat puzzling:

$ SELECT row_to_json( x ) FROM (SELECT n.oid AS nspoid, n.nspname, array( SELECT i FROM (SELECT c.oid AS reloid, c.relname FROM pg_class c WHERE c.relnamespace = n.oid LIMIT 2) AS i) FROM pg_namespace n) AS x;
                                                                       row_to_json
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 {"f1":99,"f2":"pg_toast","f3":[{"reloid":2840,"relname":"pg_toast_2619"},{"reloid":2841,"relname":"pg_toast_2619_index"}]}
 {"f1":11195,"f2":"pg_temp_1","f3":[]}
 {"f1":11196,"f2":"pg_toast_temp_1","f3":[]}
 {"f1":11,"f2":"pg_catalog","f3":[{"reloid":2619,"relname":"pg_statistic"},{"reloid":1247,"relname":"pg_type"}]}
 {"f1":11498,"f2":"information_schema","f3":[{"reloid":11515,"relname":"information_schema_catalog_name"},{"reloid":11521,"relname":"applicable_roles"}]}
 {"f1":2200,"f2":"public","f3":[{"reloid":18631,"relname":"z"},{"reloid":18634,"relname":"test"}]}
(6 ROWS)

Names of columns in subquery, grouped in array – are preserved, but names of column in outer select are not.

Still – it works, and for most of the cases, will return sensible answers.

Now, if we'd only get more tools to work on JSON values – getting keys/values from hashes, iteration over json arrays – that would be great. But still – the things we got are great start. Thanks guys.

7 thoughts on “Waiting for 9.2 – JSON”

  1. Column names should now be vastly improved. The fix was committed last night.

  2. @Andrew:
    yeah.

    Return from the last query is now:

                                                                                 row_to_json                                                                              
    ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     {"nspoid":99,"nspname":"pg_toast","array":[{"reloid":2840,"relname":"pg_toast_2619"},{"reloid":2841,"relname":"pg_toast_2619_index"}]}
     {"nspoid":11195,"nspname":"pg_temp_1","array":[]}
     {"nspoid":11196,"nspname":"pg_toast_temp_1","array":[]}
     {"nspoid":11,"nspname":"pg_catalog","array":[{"reloid":2619,"relname":"pg_statistic"},{"reloid":1247,"relname":"pg_type"}]}
     {"nspoid":11498,"nspname":"information_schema","array":[{"reloid":11515,"relname":"information_schema_catalog_name"},{"reloid":11521,"relname":"applicable_roles"}]}
     {"nspoid":2200,"nspname":"public","array":[]}
    (6 rows)

    But the array_to_json still loses column names:

    $ select array( select z from (select i, i*2 as j from generate_series(1,4) i ) as z );
                   array               
    ───────────────────────────────────
     {"(1,2)","(2,4)","(3,6)","(4,8)"}
    (1 row)
  3. my array_to_json example was obviously flawed – it didn’t even call the function. my bad. fixed in text.

  4. OMG, this is the single best thing I have read all year, keep up the good work guys

  5. The next step which definitely needs to appear is functions to convert between JSON and HSTORE (and maybe XML). So far, HSTORE is incredibly useful but the need for manually parsing HSTORE record string representation in query results is stopping me from using it even more.

  6. In postgresql 9.2.2 the select *, row_to_json(r) from t as r; example returns: ‘relation t does not exist’ but…

    select *, row_to_json(r) from test as r; returns:

    x | row_to_json
    ————————————————–
    {“a”:123} | {“x”:{“a”:123}}
    {“b” : 123} | {“x”:{“b” : 123}}
    (2 rows)

    Is this expected behavior?

  7. @Armand:
    this was syntax example. The data in t table was different than test, I didn’t show how to create this table, as I assumed that anyone can create their own sample tables, and showing it didn’t seem important.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.