Waiting for PostgreSQL 14 – SEARCH and CYCLE clauses

On 1st of February 2021, Peter Eisentraut committed patch:

SEARCH and CYCLE clauses 
 
This adds the SQL standard feature that adds the SEARCH and CYCLE
clauses to recursive queries to be able to do produce breadth- or
depth-first search orders and detect cycles.  These clauses can be
rewritten into queries using existing syntax, and that is what this
patch does in the rewriter.
 
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/db80ceee-6f97-9b4a-8ee8-3ba0c58e5be2@2ndquadrant.com

Just as message suggests this change is related to recursive queries.

Which means, that to test it, we need some data for recursive queries. This time I took directory listing of my /etc, and loaded it to db:

$ CREATE TABLE dirs (
    id        INT NOT NULL,
    parent_id INT REFERENCES dirs (id),
    dirname   text,
    PRIMARY KEY (id)
);
CREATE TABLE
 
$ copy dirs FROM '/tmp/dirs.data';
COPY 434

Great, we can now see full paths to each dir:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
)
SELECT id, path FROM c ORDER BY path LIMIT 15;
 id |            path             
----+-----------------------------
  1 | etc
 42 | etc/acpi
 43 | etc/acpi/events
 44 | etc/alsa
 45 | etc/alsa/conf.d
 46 | etc/alternatives
 47 | etc/apache2
 48 | etc/apache2/conf-available
 49 | etc/apm
 50 | etc/apm/resume.d
 51 | etc/apm/scripts.d
 52 | etc/apm/suspend.d
 53 | etc/apparmor
 54 | etc/apparmor.d
 55 | etc/apparmor.d/abstractions
(15 ROWS)

Cool. Now let's see what these new changes get us.

First thing is:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
) SEARCH DEPTH FIRST BY id SET ordercol
SELECT id, path FROM c ORDER BY ordercol LIMIT 15;
 id  |                    path                    
-----+--------------------------------------------
   1 | etc
   2 | etc/.java
 434 | etc/.java/.systemPrefs
   3 | etc/GNUstep
   4 | etc/ImageMagick-6
   5 | etc/Muttrc.d
   6 | etc/NetworkManager
   7 | etc/NetworkManager/conf.d
   8 | etc/NetworkManager/dispatcher.d
   9 | etc/NetworkManager/dispatcher.d/no-wait.d
  10 | etc/NetworkManager/dispatcher.d/pre-down.d
  11 | etc/NetworkManager/dispatcher.d/pre-up.d
  12 | etc/NetworkManager/dnsmasq.d
  13 | etc/NetworkManager/dnsmasq-shared.d
  14 | etc/NetworkManager/system-connections
(15 ROWS)

OK. So this returns the dirs, depth first, but ordered by id.

Let's try to get the previous output, ascii-betical order, using this new approach, but also, let's see what ordercol really is:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
) SEARCH DEPTH FIRST BY path SET ordercol
SELECT id, path, ordercol FROM c ORDER BY ordercol LIMIT 15;
 id |                     path                     |                                         ordercol                                          
----+----------------------------------------------+-------------------------------------------------------------------------------------------
  1 | etc                                          | {(etc)}
 42 | etc/acpi                                     | {(etc),(etc/acpi)}
 43 | etc/acpi/events                              | {(etc),(etc/acpi),(etc/acpi/events)}
 44 | etc/alsa                                     | {(etc),(etc/alsa)}
 45 | etc/alsa/conf.d                              | {(etc),(etc/alsa),(etc/alsa/conf.d)}
 46 | etc/alternatives                             | {(etc),(etc/alternatives)}
 47 | etc/apache2                                  | {(etc),(etc/apache2)}
 48 | etc/apache2/conf-available                   | {(etc),(etc/apache2),(etc/apache2/conf-available)}
 49 | etc/apm                                      | {(etc),(etc/apm)}
 50 | etc/apm/resume.d                             | {(etc),(etc/apm),(etc/apm/resume.d)}
 51 | etc/apm/scripts.d                            | {(etc),(etc/apm),(etc/apm/scripts.d)}
 52 | etc/apm/suspend.d                            | {(etc),(etc/apm),(etc/apm/suspend.d)}
 53 | etc/apparmor                                 | {(etc),(etc/apparmor)}
 67 | etc/apparmor/init                            | {(etc),(etc/apparmor),(etc/apparmor/init)}
 68 | etc/apparmor/init/network-interface-security | {(etc),(etc/apparmor),(etc/apparmor/init),(etc/apparmor/init/network-interface-security)}
(15 ROWS)

So, I can see now, that in this sort – etc/apparmor/X is actually before /etc/apparmor.d/X. Which kinda makes sense. Let's compare the two orders on their own:

Old approach:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE id IN (53, 54)
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
)
SELECT id, path FROM c ORDER BY path;
 id |                   path                    
----+-------------------------------------------
 53 | apparmor
 54 | apparmor.d
 55 | apparmor.d/abstractions
 56 | apparmor.d/abstractions/apparmor_api
 57 | apparmor.d/abstractions/ubuntu-browsers.d
 58 | apparmor.d/disable
 59 | apparmor.d/force-complain
 60 | apparmor.d/libvirt
 61 | apparmor.d/LOCAL
 62 | apparmor.d/LOCAL/abstractions
 63 | apparmor.d/tunables
 64 | apparmor.d/tunables/home.d
 65 | apparmor.d/tunables/multiarch.d
 66 | apparmor.d/tunables/xdg-user-dirs.d
 67 | apparmor/init
 68 | apparmor/init/network-interface-security
(16 ROWS)

and the new one:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE id IN (53, 54)
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
) SEARCH DEPTH FIRST BY path SET ordercol
SELECT id, path FROM c ORDER BY ordercol;
 id |                   path                    
----+-------------------------------------------
 53 | apparmor
 67 | apparmor/init
 68 | apparmor/init/network-interface-security
 54 | apparmor.d
 55 | apparmor.d/abstractions
 56 | apparmor.d/abstractions/apparmor_api
 57 | apparmor.d/abstractions/ubuntu-browsers.d
 58 | apparmor.d/disable
 59 | apparmor.d/force-complain
 60 | apparmor.d/libvirt
 61 | apparmor.d/LOCAL
 62 | apparmor.d/LOCAL/abstractions
 63 | apparmor.d/tunables
 64 | apparmor.d/tunables/home.d
 65 | apparmor.d/tunables/multiarch.d
 66 | apparmor.d/tunables/xdg-user-dirs.d
(16 ROWS)

Please note that apparmor/init and apparmor/init/network-interface-security are in much better place now – immediately after apparmor/, and not apparmor.d. Nice!

What's cool, we can also now also have order by breadth first:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
) SEARCH BREADTH FIRST BY id SET ordercol
SELECT id, path FROM c ORDER BY ordercol LIMIT 15;
 id |        path         
----+---------------------
  1 | etc
  2 | etc/.java
  3 | etc/GNUstep
  4 | etc/ImageMagick-6
  5 | etc/Muttrc.d
  6 | etc/NetworkManager
 15 | etc/ODBCDataSources
 16 | etc/OpenCL
 18 | etc/PackageKit
 19 | etc/UPower
 20 | etc/X11
 42 | etc/acpi
 44 | etc/alsa
 46 | etc/alternatives
 47 | etc/apache2
(15 ROWS)

yeah – There are many dirs in /etc, so to see some dirs deeper I will have to offset quite a lot:

$ WITH RECURSIVE c AS (
    SELECT id, dirname AS path FROM dirs WHERE parent_id IS NULL
    UNION ALL
    SELECT dirs.id, c.path || '/' || dirs.dirname AS path FROM dirs JOIN c ON dirs.parent_id = c.id
) SEARCH BREADTH FIRST BY id SET ordercol
SELECT id, path FROM c ORDER BY ordercol LIMIT 15 offset 170;
 id  |                 path                  
-----+---------------------------------------
 420 | etc/vulkan
 424 | etc/wpa_supplicant
 425 | etc/xdg
 433 | etc/xml
   7 | etc/NetworkManager/conf.d
   8 | etc/NetworkManager/dispatcher.d
  12 | etc/NetworkManager/dnsmasq.d
  13 | etc/NetworkManager/dnsmasq-shared.d
  14 | etc/NetworkManager/system-connections
  17 | etc/OpenCL/vendors
  21 | etc/X11/Xreset.d
  22 | etc/X11/Xresources
  23 | etc/X11/Xsession.d
  24 | etc/X11/app-defaults
  25 | etc/X11/cursors
(15 ROWS)

This is awesome.

For the second part of the change – cycles, I will need different dataset.

So I took a small bit of Ticket to Ride map:

and converted it to table:

$ CREATE TABLE conns (
    c_from text,
    c_to   text,
    PRIMARY KEY (c_from, c_to)
);
CREATE TABLE
 
$ INSERT INTO conns (c_from, c_to)
VALUES
    ( 'Chicago',  'Duluth' ),
    ( 'Duluth',  'Chicago' ),
    ( 'Chicago',  'Toronto' ),
    ( 'Toronto',  'Chicago' ),
    ( 'Chicago',  'Pittsburg' ),
    ( 'Pittsburg',  'Chicago' ),
    ( 'Pittsburg',  'Toronto' ),
    ( 'Toronto',  'Pittsburg' ),
    ( 'Duluth',  'Toronto' ),
    ( 'Toronto',  'Duluth' ),
    ( 'Chicago',  'Omaha' ),
    ( 'Omaha',  'Chicago' );
INSERT 0 12

Now, let's print all possible routes we can go through, starting from Chicago. Simplistic approach:

$ WITH RECURSIVE c AS (
    SELECT c_from, c_to, array[ c_from, c_to ] AS track FROM conns WHERE c_from = 'Chicago'
  UNION ALL
    SELECT n.c_from, n.c_to, c.track || n.c_to
    FROM conns n JOIN c ON n.c_from = c.c_to
)
SELECT * FROM c;

will, of course, never end, because there are cycles in here.

But now, we can have cycle detection! So let's try it:

$ WITH RECURSIVE c AS (
    SELECT NULL AS c_from, 'Chicago' AS c_to, ARRAY[ 'Chicago' ] AS track
  UNION ALL
    SELECT n.c_from, n.c_to, c.track || n.c_to
    FROM conns n JOIN c ON n.c_from = c.c_to
) CYCLE c_to SET is_cycle TO TRUE DEFAULT FALSE USING cycle_path
SELECT * FROM c;
  c_from   |   c_to    |                   track                    | is_cycle |                      cycle_path                      
-----------+-----------+--------------------------------------------+----------+------------------------------------------------------
           | Chicago   | {Chicago}                                  | f        | {(Chicago)}
 Chicago   | Duluth    | {Chicago,Duluth}                           | f        | {(Chicago),(Duluth)}
 Chicago   | Toronto   | {Chicago,Toronto}                          | f        | {(Chicago),(Toronto)}
 Chicago   | Pittsburg | {Chicago,Pittsburg}                        | f        | {(Chicago),(Pittsburg)}
 Chicago   | Omaha     | {Chicago,Omaha}                            | f        | {(Chicago),(Omaha)}
 Duluth    | Chicago   | {Chicago,Duluth,Chicago}                   | t        | {(Chicago),(Duluth),(Chicago)}
 Toronto   | Chicago   | {Chicago,Toronto,Chicago}                  | t        | {(Chicago),(Toronto),(Chicago)}
 Pittsburg | Chicago   | {Chicago,Pittsburg,Chicago}                | t        | {(Chicago),(Pittsburg),(Chicago)}
 Pittsburg | Toronto   | {Chicago,Pittsburg,Toronto}                | f        | {(Chicago),(Pittsburg),(Toronto)}
 Toronto   | Pittsburg | {Chicago,Toronto,Pittsburg}                | f        | {(Chicago),(Toronto),(Pittsburg)}
 Duluth    | Toronto   | {Chicago,Duluth,Toronto}                   | f        | {(Chicago),(Duluth),(Toronto)}
 Toronto   | Duluth    | {Chicago,Toronto,Duluth}                   | f        | {(Chicago),(Toronto),(Duluth)}
 Omaha     | Chicago   | {Chicago,Omaha,Chicago}                    | t        | {(Chicago),(Omaha),(Chicago)}
 Duluth    | Chicago   | {Chicago,Toronto,Duluth,Chicago}           | t        | {(Chicago),(Toronto),(Duluth),(Chicago)}
 Toronto   | Chicago   | {Chicago,Duluth,Toronto,Chicago}           | t        | {(Chicago),(Duluth),(Toronto),(Chicago)}
 Toronto   | Chicago   | {Chicago,Pittsburg,Toronto,Chicago}        | t        | {(Chicago),(Pittsburg),(Toronto),(Chicago)}
 Pittsburg | Chicago   | {Chicago,Toronto,Pittsburg,Chicago}        | t        | {(Chicago),(Toronto),(Pittsburg),(Chicago)}
 Pittsburg | Toronto   | {Chicago,Toronto,Pittsburg,Toronto}        | t        | {(Chicago),(Toronto),(Pittsburg),(Toronto)}
 Toronto   | Pittsburg | {Chicago,Duluth,Toronto,Pittsburg}         | f        | {(Chicago),(Duluth),(Toronto),(Pittsburg)}
 Toronto   | Pittsburg | {Chicago,Pittsburg,Toronto,Pittsburg}      | t        | {(Chicago),(Pittsburg),(Toronto),(Pittsburg)}
 Duluth    | Toronto   | {Chicago,Toronto,Duluth,Toronto}           | t        | {(Chicago),(Toronto),(Duluth),(Toronto)}
 Toronto   | Duluth    | {Chicago,Duluth,Toronto,Duluth}            | t        | {(Chicago),(Duluth),(Toronto),(Duluth)}
 Toronto   | Duluth    | {Chicago,Pittsburg,Toronto,Duluth}         | f        | {(Chicago),(Pittsburg),(Toronto),(Duluth)}
 Duluth    | Chicago   | {Chicago,Pittsburg,Toronto,Duluth,Chicago} | t        | {(Chicago),(Pittsburg),(Toronto),(Duluth),(Chicago)}
 Pittsburg | Chicago   | {Chicago,Duluth,Toronto,Pittsburg,Chicago} | t        | {(Chicago),(Duluth),(Toronto),(Pittsburg),(Chicago)}
 Pittsburg | Toronto   | {Chicago,Duluth,Toronto,Pittsburg,Toronto} | t        | {(Chicago),(Duluth),(Toronto),(Pittsburg),(Toronto)}
 Duluth    | Toronto   | {Chicago,Pittsburg,Toronto,Duluth,Toronto} | t        | {(Chicago),(Pittsburg),(Toronto),(Duluth),(Toronto)}
(27 ROWS)

This is as really cool. What's more – I didn't even had to add any condition on the is_cycle, it was sufficient that it's calculated, and processing stopped. And I can still add “WHERE is_cycle = false" to weed out the connections that cycle back to city that was already visited:

$ WITH RECURSIVE c AS (
    SELECT NULL AS c_from, 'Chicago' AS c_to, ARRAY[ 'Chicago' ] AS track
  UNION ALL
    SELECT n.c_from, n.c_to, c.track || n.c_to
    FROM conns n JOIN c ON n.c_from = c.c_to
) CYCLE c_to SET is_cycle TO TRUE DEFAULT FALSE USING cycle_path
SELECT * FROM c WHERE is_cycle = FALSE;
  c_from   |   c_to    |               track                | is_cycle |                 cycle_path                 
-----------+-----------+------------------------------------+----------+--------------------------------------------
           | Chicago   | {Chicago}                          | f        | {(Chicago)}
 Chicago   | Duluth    | {Chicago,Duluth}                   | f        | {(Chicago),(Duluth)}
 Chicago   | Toronto   | {Chicago,Toronto}                  | f        | {(Chicago),(Toronto)}
 Chicago   | Pittsburg | {Chicago,Pittsburg}                | f        | {(Chicago),(Pittsburg)}
 Chicago   | Omaha     | {Chicago,Omaha}                    | f        | {(Chicago),(Omaha)}
 Pittsburg | Toronto   | {Chicago,Pittsburg,Toronto}        | f        | {(Chicago),(Pittsburg),(Toronto)}
 Toronto   | Pittsburg | {Chicago,Toronto,Pittsburg}        | f        | {(Chicago),(Toronto),(Pittsburg)}
 Duluth    | Toronto   | {Chicago,Duluth,Toronto}           | f        | {(Chicago),(Duluth),(Toronto)}
 Toronto   | Duluth    | {Chicago,Toronto,Duluth}           | f        | {(Chicago),(Toronto),(Duluth)}
 Toronto   | Pittsburg | {Chicago,Duluth,Toronto,Pittsburg} | f        | {(Chicago),(Duluth),(Toronto),(Pittsburg)}
 Toronto   | Duluth    | {Chicago,Pittsburg,Toronto,Duluth} | f        | {(Chicago),(Pittsburg),(Toronto),(Duluth)}
(11 ROWS)

All in all – both these things look amazing, and will greatly help writing queries for hierarchical data.

Huge thanks to all involved 🙂

2 thoughts on “Waiting for PostgreSQL 14 – SEARCH and CYCLE clauses”

  1. Awesome feat., what would be the impact on the cache-ability of such queries?

Comments are closed.