Waiting for PostgreSQL 10 – Support XMLTABLE query expression

On 8th of March, Alvaro Herrera committed patch:

XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
 
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
 
The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.
 
This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
 
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com

Long story short, it allows for extracting data in table format (rows and columns) directly from XML.

How? Let's see.

First – sample XML:

<people>
    <person>
        <first_name>Hubert</first_name>
        <last_name>Lubaczewski</last_name>
        <nick>depesz</nick>
    </person>
    <person>
        <first_name>Andrew</first_name>
        <last_name>Gierth</last_name>
        <nick>RhodiumToad</nick>
    </person>
    <person>
        <first_name>Devrim</first_name>
        <last_name>Gündüz</last_name>
    </person>
</people>

For brevity, I will shorten it in queries.

So, let's try to get some data from it:

$ WITH x AS (
    SELECT '<people>...</people>'::xml AS source_xml
)
SELECT decoded.*
FROM
    x,
    xmltable(
        '//people/person'
        passing source_xml
        COLUMNS
            first_name text,
            last_name text,
            nick text
    ) AS decoded
 first_name |  last_name  |    nick     
------------+-------------+-------------
 Hubert     | Lubaczewski | depesz
 Andrew     | Gierth      | RhodiumToad
 Devrim     | Gündüz      | [NULL]
(3 ROWS)

Nice. We can also do some column renaming, like:

$ WITH x AS (
    SELECT '<people>...</people>'::xml AS source_xml
)
SELECT decoded.*
FROM
    x,
    xmltable(
        '//people/person'
        passing source_xml
        COLUMNS
            first_name text,
            last_name text,
            nick_name text PATH 'nick'
    ) AS decoded;
 first_name |  last_name  |  nick_name  
------------+-------------+-------------
 Hubert     | Lubaczewski | depesz
 Andrew     | Gierth      | RhodiumToad
 Devrim     | Gündüz      | 
(3 ROWS)

or substitute default values:

$ WITH x AS (
    SELECT '<people>...</people>'::xml AS source_xml
)
SELECT decoded.*
FROM
    x,
    xmltable(
        '//people/person'
        passing source_xml
        COLUMNS
            first_name text,
            last_name text,
            nick text DEFAULT '---'
    ) AS decoded;
 first_name |  last_name  |    nick     
------------+-------------+-------------
 Hubert     | Lubaczewski | depesz
 Andrew     | Gierth      | RhodiumToad
 Devrim     | Gündüz      | ---
(3 ROWS)

you can also add ordinality info, or even use xpath functions:

$ WITH x AS (
    SELECT '<people>...</people>'::xml AS source_xml
)
SELECT decoded.*
FROM
    x,
    xmltable(
        '//people/person'
        passing source_xml
        COLUMNS
            id FOR ordinality,
            full_name text PATH 'concat(first_name, " ", last_name)',
            nick text DEFAULT '---'
    ) AS decoded;
 id |     full_name      |    nick     
----+--------------------+-------------
  1 | Hubert Lubaczewski | depesz
  2 | Andrew Gierth      | RhodiumToad
  3 | Devrim Gündüz      | ---
(3 ROWS)

Full functionality requires some understanding/knowledge of xml and xpath, but it's definitely cool thing to have. Will significantly simplify code required to load data from xml sources.

Thanks, guys.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.