Waiting for PostgreSQL 10 – Full Text Search support for json and jsonb

On 31st of March 2017, Andrew Dunstan committed patch:

Full Text Search support for json and jsonb
 
The new functions are ts_headline() and to_tsvector.
 
Dmitry Dolgov, edited and documented by me.

Another cool addition for JSON data. This time we can add full text searching of json values with ease.

Quick example of how it works:

$ SELECT id, jsonb_pretty(payload) FROM test;
 id |                                                jsonb_pretty
----+-------------------------------------------------------------------------------------------------------------
  1 | {                                                                                                          +
    |     "glossary": {                                                                                          +
    |         "title": "example glossary",                                                                       +
    |         "GlossDiv": {                                                                                      +
    |             "title": "S",                                                                                  +
    |             "GlossList": {                                                                                 +
    |                 "GlossEntry": {                                                                            +
    |                     "ID": "SGML",                                                                          +
    |                     "Abbrev": "ISO 8879:1986",                                                             +
    |                     "SortAs": "SGML",                                                                      +
    |                     "Acronym": "SGML",                                                                     +
    |                     "GlossDef": {                                                                          +
    |                         "para": "A meta-markup language, used to create markup languages such as DocBook.",+
    |                         "GlossSeeAlso": [                                                                  +
    |                             "GML",                                                                         +
    |                             "XML"                                                                          +
    |                         ]                                                                                  +
    |                     },                                                                                     +
    |                     "GlossSee": "markup",                                                                  +
    |                     "GlossTerm": "Standard Generalized Markup Language"                                    +
    |                 }                                                                                          +
    |             }                                                                                              +
    |         }                                                                                                  +
    |     }                                                                                                      +
    | }
(1 ROW)

As you can see I have a structure that is pretty nested. Now, to make an index, we need tsvector data. Which we can obtain by:

$ SELECT to_tsvector('english', payload) FROM test;
                                  to_tsvector
--------------------------------------------------------------------------------
 '1986':8 '8879':7 'creat':21 'docbook':26 'exampl':1 'general':35 'glossari':2.
. 'gml':28 'iso':6 'languag':18,23,37 'markup':17,22,32,36 'meta':16 'meta-mark.
.up':15 'sgml':4,10,12 'standard':34 'use':19 'xml':30
(1 ROW)

Nice. It found all words in all values in the json (it didn't index keys of
objects).

There is also matching ts_headline function:

$ SELECT jsonb_pretty(ts_headline(payload, 'sgml'::tsquery)) FROM test;
                                                jsonb_pretty
-------------------------------------------------------------------------------------------------------------
 {                                                                                                          +
     "glossary": {                                                                                          +
         "title": "example glossary",                                                                       +
         "GlossDiv": {                                                                                      +
             "title": "S",                                                                                  +
             "GlossList": {                                                                                 +
                 "GlossEntry": {                                                                            +
                     "ID": "<b>SGML</b>",                                                                   +
                     "Abbrev": "ISO 8879:1986",                                                             +
                     "SortAs": "<b>SGML</b>",                                                               +
                     "Acronym": "<b>SGML</b>",                                                              +
                     "GlossDef": {                                                                          +
                         "para": "A meta-markup language, used to create markup languages such as DocBook.",+
                         "GlossSeeAlso": [                                                                  +
                             "GML",                                                                         +
                             "XML"                                                                          +
                         ]                                                                                  +
                     },                                                                                     +
                     "GlossSee": "markup",                                                                  +
                     "GlossTerm": "Standard Generalized Markup Language"                                    +
                 }                                                                                          +
             }                                                                                              +
         }                                                                                                  +
     }                                                                                                      +
 }
(1 ROW)

please note the <b> around SGML string above.

To be honest – I think that JSON is abused a lot by app developers, but it's still definitely a great addition, thanks a lot Dmitry and Andrew.

3 thoughts on “Waiting for PostgreSQL 10 – Full Text Search support for json and jsonb”

  1. It should also works with phrase search, we specially added an extra position to avoid accident between values of different keys.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.