On 7th of April 2018, Teodor Sigaev committed patch:
Add json(b)_to_tsvector function Jsonb has a complex nature so there isn't best-for-everything way to convert it to tsvector for full text search. Current to_tsvector(json(b)) suggests to convert only string values, but it's possible to index keys, numerics and even booleans value. To solve that json(b)_to_tsvector has a second required argument contained a list of desired types of json fields. Second argument is a jsonb scalar or array right now with possibility to add new options in a future. Bump catalog version Author: Dmitry Dolgov with some editorization by me Reviewed by: Teodor Sigaev Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
This is actually pretty cool if you are using your json(b) column as a document store with “random" column names – keys that are in some values, but not always.
So, let's look at how it would be indexed as tsvector in older pg (10.3), and the new hotness.
As a test documents I'd pick:
$ SELECT id, jsonb_pretty(payload) FROM test; id | jsonb_pretty ----+------------------------------------------------------------------------------------------------------------- 1 | { + | "glossary": { + | "title": "example glossary", + | "GlossDiv": { + | "title": "S", + | "GlossList": { + | "GlossEntry": { + | "ID": "SGML", + | "Abbrev": "ISO 8879:1986", + | "SortAs": "SGML", + | "Acronym": "SGML", + | "GlossDef": { + | "para": "A meta-markup language, used to create markup languages such as DocBook.",+ | "GlossSeeAlso": [ + | "GML", + | "XML" + | ] + | }, + | "GlossSee": "markup", + | "GlossTerm": "Standard Generalized Markup Language" + | } + | } + | } + | } + | } 2 | { + | "menu": { + | "id": "file", + | "popup": { + | "menuitem": [ + | { + | "value": "New", + | "onclick": "CreateNewDoc()" + | }, + | { + | "value": "Open", + | "onclick": "OpenDoc()" + | }, + | { + | "value": "Close", + | "onclick": "CloseDoc()" + | } + | ] + | }, + | "value": "File" + | } + | } 3 | { + | "x": [ + | 1, + | 2, + | 3 + | ] + | } (3 ROWS)
On 10.3, when I tried to convert jsonb value to tsvector for full text searching, I get:
$ SELECT id, to_tsvector(json(payload)) FROM test; id | to_tsvector ----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | '1986':9 '8879':8 'creat':22 'docbook':27 'exampl':1 'general':36 'glossari':2 'gml':29 'iso':7 'languag':19,24,38 'markup':18,23,33,37 'meta':17 'meta-markup':16 'sgml':5,11,13 'standard':35 'use':20 'xml':31 2 | 'close':11 'closedoc':13 'createnewdoc':5 'file':1,15 'new':3 'open':7 'opendoc':9 3 | (3 ROWS)
As you can see values in 3rd row are fully missing. But now, thanks to the new functions, I can:
$ SELECT id, jsonb_to_tsvector('english', payload, '["all"]') FROM test; id | jsonb_to_tsvector ----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | '1986':25 '8879':24 'abbrev':21 'acronym':31 'creat':46 'docbook':51 'exampl':5 'general':66 'glossari':1,6 'glossdef':35 'glossdiv':8 'glossentri':15 'glosslist':13 'glossse':59 'glossseealso':53 'glossterm':63 'gml':55 'id':17 'iso':23 'languag':43,48,68 'markup':42,47,61,67 'meta':41 'meta-markup':40 'para':37 'sgml':19,29,33 'sorta':27 'standard':65 'titl':3,10 'use':44 'xml':57 2 | 'close':29 'closedoc':33 'createnewdoc':17 'file':5,37 'id':3 'menu':1 'menuitem':9 'new':13 'onclick':15,23,31 'open':21 'opendoc':25 'popup':7 'valu':11,19,27,35 3 | '1':3 '2':5 '3':7 'x':1 (3 ROWS)
Instead of all, I can use any of:
- string
- numeric
- boolean
- key
And you can use multiple, like here:
$ SELECT id, jsonb_to_tsvector('english', payload, '["numeric","key"]') FROM test; id | jsonb_to_tsvector ----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | 'abbrev':15 'acronym':19 'glossari':1 'glossdef':21 'glossdiv':5 'glossentri':11 'glosslist':9 'glossse':27 'glossseealso':25 'glossterm':29 'id':13 'para':23 'sorta':17 'titl':3,7 2 | 'id':3 'menu':1 'menuitem':7 'onclick':11,15,19 'popup':5 'valu':9,13,17,21 3 | '1':3 '2':5 '3':7 'x':1 (3 ROWS)
Pretty cool. That will definitely be useful 🙂 Thanks a lot.