December 16th, 2011 by depesz | Tags: , , , , | 3 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

I got asked on irc to show some examples how to use recursive CTE. Apparently my previous post wasn't good enough :)

I think that most of the users will use recursive cte to deal with trees I decided to show how to use it, even though it's not my favorite approach to dealing with trees in SQL.

Before we'll go to queries, first we need to know our data. I downloaded copy of directory structure for DMOZ, and converted it to this format:

$ \d dmoz
       Table "public.dmoz"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 id        | integer | not null
 parent_id | integer |
 codename  | text    |
Indexes:
    "dmoz_pkey" PRIMARY KEY, btree (id)
    "dmoz_parent_id" btree (parent_id)
Foreign-key constraints:
    "dmoz_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES dmoz(id)
Referenced by:
    TABLE "dmoz" CONSTRAINT "dmoz_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES dmoz(id)
$ select * from dmoz limit 10;
 id | parent_id |        codename
----+-----------+-------------------------
  1 |    [null] | AOL
  2 |    [null] | Arts
  3 |         2 | Animation
  4 |         3 | Anime
  5 |         4 | Characters
  6 |         4 | Clubs_and_Organizations
  7 |         4 | Collectibles
  8 |         7 | Cels
  9 |         7 | Models_and_Figures
 10 |         9 | Action_Figures
(10 rows)

Looks simple enough. Records with parent_id NULL are top level nodes. All other point to something below.

Basic things like:

  • get list of top level nodes
  • get list of nodes directly below given node

Are too simple to even describe, so let's move to something a bit more interesting.

First, as a warm-up – how to get “path" to given node. Let's assume we have node id ( 294010 ), and we want all parents, in order.

So, this is how the record looks:

$ SELECT * FROM dmoz WHERE id = 294010;
   id   | parent_id | codename
--------+-----------+----------
 294010 |    294009 | Maronite
(1 row)

Basic recursive approach is:

$ WITH RECURSIVE parents AS (
    SELECT * FROM dmoz WHERE id = 294010
    UNION ALL
    SELECT dmoz.* FROM dmoz join parents on dmoz.id = parents.parent_id
)
SELECT * FROM parents;
   id   | parent_id |      codename
--------+-----------+---------------------
 294010 |    294009 | Maronite
 294009 |    294008 | Eastern_Rites
 294008 |    294003 | Catholicism
 294003 |    294002 | Christianity
 294002 |    293993 | Religion
 293993 |    293815 | Society_and_Culture
 293815 |    293793 | Brooklyn
 293793 |    293622 | New_York_City
 293622 |    290084 | N
 290084 |    288505 | Localities
 288505 |    172145 | New_York
 172145 |    157689 | United_States
 157689 |     90161 | North_America
  90161 |    [null] | Regional
(14 rows)

The important part is – recursive query is union (union all actually) of two queries:

  • SELECT * FROM dmoz WHERE id = 294010 – this is our starting point
  • SELECT dmoz.* FROM dmoz join parents on dmoz.id = parents.parent_id – this is the magic

Please note that the “magic" part of query, even though it's inside
of “parents" CTE, it references this CTE (in join condition). This is the
recursive behaviour which can do a lot of cool things.

of course – the way I returned this data is pretty much useless – I have all the information I need, but it would be good to sort it properly.

In here, it just so happens that ids are ascending, so I could just “ORDER BY id", but let's assume it's not always the case, and we want proper sorting by nest level. How to do it?

$ WITH RECURSIVE parents AS (
    SELECT *, 0 as level FROM dmoz WHERE id = 294010
    UNION ALL
    SELECT dmoz.*, parents.level + 1 as level FROM dmoz join parents on dmoz.id = parents.parent_id
)
SELECT * FROM parents ORDER BY level DESC;
   id   | parent_id |      codename       | level
--------+-----------+---------------------+-------
  90161 |    [null] | Regional            |    13
 157689 |     90161 | North_America       |    12
 172145 |    157689 | United_States       |    11
 288505 |    172145 | New_York            |    10
 290084 |    288505 | Localities          |     9
 293622 |    290084 | N                   |     8
 293793 |    293622 | New_York_City       |     7
 293815 |    293793 | Brooklyn            |     6
 293993 |    293815 | Society_and_Culture |     5
 294002 |    293993 | Religion            |     4
 294003 |    294002 | Christianity        |     3
 294008 |    294003 | Catholicism         |     2
 294009 |    294008 | Eastern_Rites       |     1
 294010 |    294009 | Maronite            |     0
(14 rows)

As you can see in the starting point query I added “0 as level". And in the magic part of the query, I incremented level from previous row. This, plus ORDER BY level DESC solved the problem. Even if ids wouldn't be in order, I would get the path in correct way.

How efficient it is?

$ EXPLAIN ANALYZE
WITH RECURSIVE parents AS (
    SELECT *, 0 as level FROM dmoz WHERE id = 294010
    UNION ALL
    SELECT dmoz.*, parents.level + 1 as level FROM dmoz join parents on dmoz.id = parents.parent_id
)
SELECT * FROM parents ORDER BY level DESC;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=851.54..851.79 rows=101 width=44) (actual time=0.125..0.128 rows=14 loops=1)
   Sort Key: parents.level
   Sort Method: quicksort  Memory: 26kB
   CTE parents
     ->  Recursive Union  (cost=0.00..846.15 rows=101 width=25) (actual time=0.010..0.100 rows=14 loops=1)
           ->  Index Scan using dmoz_pkey on dmoz  (cost=0.00..8.32 rows=1 width=21) (actual time=0.009..0.010 rows=1 loops=1)
                 Index Cond: (id = 294010)
           ->  Nested Loop  (cost=0.00..83.58 rows=10 width=25) (actual time=0.005..0.005 rows=1 loops=14)
                 ->  WorkTable Scan on parents  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.000 rows=1 loops=14)
                 ->  Index Scan using dmoz_pkey on dmoz  (cost=0.00..8.32 rows=1 width=21) (actual time=0.003..0.004 rows=1 loops=14)
                       Index Cond: (id = parents.parent_id)
   ->  CTE Scan on parents  (cost=0.00..2.02 rows=101 width=44) (actual time=0.012..0.114 rows=14 loops=1)
 Total runtime: 0.177 ms
(13 rows)

No magic in here – 15 index scans on dmoz_pkey (one to get the starting point, and 14 loops to get parents). The benefit is that you don't have to run every query yourself, so you save time for roudtrips to server and back.

That was simple. So let's talk about something else. How to get all nodes from given point, down the tree?

Full dmoz tree is a bit over half a million records, so I will do it for a subset of the tree. Let's say, I want all nodes in the Christianity from above table. It's id is 294003. So how can we get all the data?

We start with basic:

$ WITH RECURSIVE c AS (
    SELECT * FROM dmoz WHERE id = 294003
)
SELECT * FROM c;
   id   | parent_id |   codename
--------+-----------+--------------
 294003 |    294002 | Christianity
(1 row)

Please note that I used “RECURSIVE" keyword, while the query is not recursive at all. It's OK, I can do it, it doesn't cause any problems.

Based on what I showed with previous queries, I can quickly add part to get all kids:

$ WITH RECURSIVE c AS (
    SELECT * FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT dmoz.* FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT * FROM c;
   id   | parent_id |              codename
--------+-----------+------------------------------------
 294003 |    294002 | Christianity
 294004 |    294003 | Anabaptist
 294006 |    294003 | Anglican
 294007 |    294003 | Baptist
 294008 |    294003 | Catholicism
 294011 |    294003 | Christian_and_Missionary_Alliance
 294012 |    294003 | Christian_Church
 294014 |    294003 | Churches_of_God
 294017 |    294003 | Congregational
 294018 |    294003 | Evangelical_Free_Church_of_America
 294019 |    294003 | Lutheran
 294020 |    294003 | Methodist
 294023 |    294003 | Orthodox
 294024 |    294003 | Pentecostalism
 294026 |    294003 | Presbyterian
 294027 |    294003 | Reformed_Church
 294028 |    294003 | Religious_Society_of_Friends
 294029 |    294003 | Seventh-day_Adventists
 294030 |    294003 | United_Church_of_Christ
 294005 |    294004 | Mennonites
 294009 |    294008 | Eastern_Rites
 294013 |    294012 | Disciples_of_Christ
 294015 |    294014 | Church_of_God_in_Christ
 294016 |    294014 | Worldwide_Church_of_God
 294021 |    294020 | African_Methodist_Episcopal
 294022 |    294020 | United_Methodist
 294025 |    294024 | Assemblies_of_God
 294010 |    294009 | Maronite
(28 rows)

What's important – I can also add level information, for example if I'd want to limit the depth of shown information to “2 levels below current node" or something like this:

WITH RECURSIVE c AS (
    SELECT *, 0 as level FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT dmoz.*, c.level + 1 as level FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT * FROM c;
   id   | parent_id |              codename              | level
--------+-----------+------------------------------------+-------
 294003 |    294002 | Christianity                       |     0
 294004 |    294003 | Anabaptist                         |     1
 294006 |    294003 | Anglican                           |     1
 294007 |    294003 | Baptist                            |     1
 294008 |    294003 | Catholicism                        |     1
 294011 |    294003 | Christian_and_Missionary_Alliance  |     1
 294012 |    294003 | Christian_Church                   |     1
 294014 |    294003 | Churches_of_God                    |     1
 294017 |    294003 | Congregational                     |     1
 294018 |    294003 | Evangelical_Free_Church_of_America |     1
 294019 |    294003 | Lutheran                           |     1
 294020 |    294003 | Methodist                          |     1
 294023 |    294003 | Orthodox                           |     1
 294024 |    294003 | Pentecostalism                     |     1
 294026 |    294003 | Presbyterian                       |     1
 294027 |    294003 | Reformed_Church                    |     1
 294028 |    294003 | Religious_Society_of_Friends       |     1
 294029 |    294003 | Seventh-day_Adventists             |     1
 294030 |    294003 | United_Church_of_Christ            |     1
 294005 |    294004 | Mennonites                         |     2
 294009 |    294008 | Eastern_Rites                      |     2
 294013 |    294012 | Disciples_of_Christ                |     2
 294015 |    294014 | Church_of_God_in_Christ            |     2
 294016 |    294014 | Worldwide_Church_of_God            |     2
 294021 |    294020 | African_Methodist_Episcopal        |     2
 294022 |    294020 | United_Methodist                   |     2
 294025 |    294024 | Assemblies_of_God                  |     2
 294010 |    294009 | Maronite                           |     3
(28 rows)

This is all great, but how can I sort it so that the result will be sensible?

We can add “paths" – generated by taking all codenames for given element and its parent (up to the starting point), and sort with it:

WITH RECURSIVE c AS (
    SELECT *, 0 as level, codename as path FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT dmoz.*, c.level + 1 as level, c.path || '/' || dmoz.codename as path FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT * FROM c ORDER BY path;
   id   | parent_id |              codename              | level |                         path
--------+-----------+------------------------------------+-------+------------------------------------------------------
 294003 |    294002 | Christianity                       |     0 | Christianity
 294004 |    294003 | Anabaptist                         |     1 | Christianity/Anabaptist
 294005 |    294004 | Mennonites                         |     2 | Christianity/Anabaptist/Mennonites
 294006 |    294003 | Anglican                           |     1 | Christianity/Anglican
 294007 |    294003 | Baptist                            |     1 | Christianity/Baptist
 294008 |    294003 | Catholicism                        |     1 | Christianity/Catholicism
 294009 |    294008 | Eastern_Rites                      |     2 | Christianity/Catholicism/Eastern_Rites
 294010 |    294009 | Maronite                           |     3 | Christianity/Catholicism/Eastern_Rites/Maronite
 294011 |    294003 | Christian_and_Missionary_Alliance  |     1 | Christianity/Christian_and_Missionary_Alliance
 294012 |    294003 | Christian_Church                   |     1 | Christianity/Christian_Church
 294013 |    294012 | Disciples_of_Christ                |     2 | Christianity/Christian_Church/Disciples_of_Christ
 294014 |    294003 | Churches_of_God                    |     1 | Christianity/Churches_of_God
 294015 |    294014 | Church_of_God_in_Christ            |     2 | Christianity/Churches_of_God/Church_of_God_in_Christ
 294016 |    294014 | Worldwide_Church_of_God            |     2 | Christianity/Churches_of_God/Worldwide_Church_of_God
 294017 |    294003 | Congregational                     |     1 | Christianity/Congregational
 294018 |    294003 | Evangelical_Free_Church_of_America |     1 | Christianity/Evangelical_Free_Church_of_America
 294019 |    294003 | Lutheran                           |     1 | Christianity/Lutheran
 294020 |    294003 | Methodist                          |     1 | Christianity/Methodist
 294021 |    294020 | African_Methodist_Episcopal        |     2 | Christianity/Methodist/African_Methodist_Episcopal
 294022 |    294020 | United_Methodist                   |     2 | Christianity/Methodist/United_Methodist
 294023 |    294003 | Orthodox                           |     1 | Christianity/Orthodox
 294024 |    294003 | Pentecostalism                     |     1 | Christianity/Pentecostalism
 294025 |    294024 | Assemblies_of_God                  |     2 | Christianity/Pentecostalism/Assemblies_of_God
 294026 |    294003 | Presbyterian                       |     1 | Christianity/Presbyterian
 294027 |    294003 | Reformed_Church                    |     1 | Christianity/Reformed_Church
 294028 |    294003 | Religious_Society_of_Friends       |     1 | Christianity/Religious_Society_of_Friends
 294029 |    294003 | Seventh-day_Adventists             |     1 | Christianity/Seventh-day_Adventists
 294030 |    294003 | United_Church_of_Christ            |     1 | Christianity/United_Church_of_Christ
(28 rows)

And finally, explain:

EXPLAIN ANALYZE
WITH RECURSIVE c AS (
    SELECT *, 0 as level, codename as path FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT dmoz.*, c.level + 1 as level, c.path || '/' || dmoz.codename as path FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT * FROM c ORDER BY path;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=980.07..982.74 rows=1071 width=76) (actual time=0.331..0.333 rows=28 loops=1)
   Sort Key: c.path
   Sort Method: quicksort  Memory: 28kB
   CTE c
     ->  Recursive Union  (cost=0.00..904.75 rows=1071 width=57) (actual time=0.011..0.164 rows=28 loops=1)
           ->  Index Scan using dmoz_pkey on dmoz  (cost=0.00..8.32 rows=1 width=21) (actual time=0.009..0.010 rows=1 loops=1)
                 Index Cond: (id = 294003)
           ->  Nested Loop  (cost=0.00..87.50 rows=107 width=57) (actual time=0.007..0.034 rows=7 loops=4)
                 ->  WorkTable Scan on c  (cost=0.00..0.20 rows=10 width=40) (actual time=0.000..0.002 rows=7 loops=4)
                 ->  Index Scan using dmoz_parent_id on dmoz  (cost=0.00..8.51 rows=11 width=21) (actual time=0.003..0.003 rows=1 loops=28)
                       Index Cond: (parent_id = c.id)
   ->  CTE Scan on c  (cost=0.00..21.42 rows=1071 width=76) (actual time=0.013..0.191 rows=28 loops=1)
 Total runtime: 0.405 ms
(13 rows)

Now, one can ask: OK, but I want custom ordering – for example I want Catholicism as the first element in Christianity. Solution: simply add column like “priority", and use this column (+codename to handle duplicate priorities):

$ alter table dmoz add column priority int4 default 500;
ALTER TABLE
 
$ alter table dmoz add check ( priority between 100 and 999 );
ALTER TABLE

Now – all priorities will be 3 digit integers, with default value being 500.

So, now I can increase priority of Catholicism (record in database, not religion!):

$ UPDATE dmoz set priority = 250 where id = 294008;
UPDATE 1

With priority change in place, I can change the query to:

$ WITH RECURSIVE c AS (
    SELECT
        *,
        0 as level,
        codename as path,
        priority || codename as priority_path
    FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT
        dmoz.*,
        c.level + 1 as level,
        c.path || '/' || dmoz.codename as path,
        c.priority_path || '/' || dmoz.priority || dmoz.codename as priority_path
    FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT * FROM c ORDER BY priority_path;
   id   | parent_id |              codename              | priority | level |                         path                         |                         priority_path
--------+-----------+------------------------------------+----------+-------+------------------------------------------------------+---------------------------------------------------------------
 294003 |    294002 | Christianity                       |      500 |     0 | Christianity                                         | 500Christianity
 294008 |    294003 | Catholicism                        |      250 |     1 | Christianity/Catholicism                             | 500Christianity/250Catholicism
 294009 |    294008 | Eastern_Rites                      |      500 |     2 | Christianity/Catholicism/Eastern_Rites               | 500Christianity/250Catholicism/500Eastern_Rites
 294010 |    294009 | Maronite                           |      500 |     3 | Christianity/Catholicism/Eastern_Rites/Maronite      | 500Christianity/250Catholicism/500Eastern_Rites/500Maronite
 294004 |    294003 | Anabaptist                         |      500 |     1 | Christianity/Anabaptist                              | 500Christianity/500Anabaptist
 294005 |    294004 | Mennonites                         |      500 |     2 | Christianity/Anabaptist/Mennonites                   | 500Christianity/500Anabaptist/500Mennonites
 294006 |    294003 | Anglican                           |      500 |     1 | Christianity/Anglican                                | 500Christianity/500Anglican
 294007 |    294003 | Baptist                            |      500 |     1 | Christianity/Baptist                                 | 500Christianity/500Baptist
 294011 |    294003 | Christian_and_Missionary_Alliance  |      500 |     1 | Christianity/Christian_and_Missionary_Alliance       | 500Christianity/500Christian_and_Missionary_Alliance
 294012 |    294003 | Christian_Church                   |      500 |     1 | Christianity/Christian_Church                        | 500Christianity/500Christian_Church
 294013 |    294012 | Disciples_of_Christ                |      500 |     2 | Christianity/Christian_Church/Disciples_of_Christ    | 500Christianity/500Christian_Church/500Disciples_of_Christ
 294014 |    294003 | Churches_of_God                    |      500 |     1 | Christianity/Churches_of_God                         | 500Christianity/500Churches_of_God
 294015 |    294014 | Church_of_God_in_Christ            |      500 |     2 | Christianity/Churches_of_God/Church_of_God_in_Christ | 500Christianity/500Churches_of_God/500Church_of_God_in_Christ
 294016 |    294014 | Worldwide_Church_of_God            |      500 |     2 | Christianity/Churches_of_God/Worldwide_Church_of_God | 500Christianity/500Churches_of_God/500Worldwide_Church_of_God
 294017 |    294003 | Congregational                     |      500 |     1 | Christianity/Congregational                          | 500Christianity/500Congregational
 294018 |    294003 | Evangelical_Free_Church_of_America |      500 |     1 | Christianity/Evangelical_Free_Church_of_America      | 500Christianity/500Evangelical_Free_Church_of_America
 294019 |    294003 | Lutheran                           |      500 |     1 | Christianity/Lutheran                                | 500Christianity/500Lutheran
 294020 |    294003 | Methodist                          |      500 |     1 | Christianity/Methodist                               | 500Christianity/500Methodist
 294021 |    294020 | African_Methodist_Episcopal        |      500 |     2 | Christianity/Methodist/African_Methodist_Episcopal   | 500Christianity/500Methodist/500African_Methodist_Episcopal
 294022 |    294020 | United_Methodist                   |      500 |     2 | Christianity/Methodist/United_Methodist              | 500Christianity/500Methodist/500United_Methodist
 294023 |    294003 | Orthodox                           |      500 |     1 | Christianity/Orthodox                                | 500Christianity/500Orthodox
 294024 |    294003 | Pentecostalism                     |      500 |     1 | Christianity/Pentecostalism                          | 500Christianity/500Pentecostalism
 294025 |    294024 | Assemblies_of_God                  |      500 |     2 | Christianity/Pentecostalism/Assemblies_of_God        | 500Christianity/500Pentecostalism/500Assemblies_of_God
 294026 |    294003 | Presbyterian                       |      500 |     1 | Christianity/Presbyterian                            | 500Christianity/500Presbyterian
 294027 |    294003 | Reformed_Church                    |      500 |     1 | Christianity/Reformed_Church                         | 500Christianity/500Reformed_Church
 294028 |    294003 | Religious_Society_of_Friends       |      500 |     1 | Christianity/Religious_Society_of_Friends            | 500Christianity/500Religious_Society_of_Friends
 294029 |    294003 | Seventh-day_Adventists             |      500 |     1 | Christianity/Seventh-day_Adventists                  | 500Christianity/500Seventh-day_Adventists
 294030 |    294003 | United_Church_of_Christ            |      500 |     1 | Christianity/United_Church_of_Christ                 | 500Christianity/500United_Church_of_Christ
(28 rows)

The query looks different, but that's just because of indentation – the only real change is addition of priority_path column, and usage of it (instead of normal path) for ORDER BY.

Of course – depending on your case you might not calculate “path" at all, or return just a subset of columns. Or even do voodoo magic, and return the data as html:

$ WITH RECURSIVE c AS (
    SELECT
        *,
        0 as level,
        codename as path,
        priority || codename as priority_path
    FROM dmoz WHERE id = 294003
    UNION ALL
    SELECT
        dmoz.*,
        c.level + 1 as level,
        c.path || '/' || dmoz.codename as path,
        c.priority_path || '/' || dmoz.priority || dmoz.codename as priority_path
    FROM dmoz join c ON dmoz.parent_id = c.id
)
SELECT
    repeat( '<ol>', level - coalesce( lag(level) over ( ORDER BY priority_path ), -1 ) ) ||
    '<li><a href="http://www.dmoz.org/Regional/North_America/United_States/New_York/Localities/N/New_York_City/Brooklyn/Society_and_Culture/Religion/' || c.path || '">' || c.codename || '</a></li>' ||
    repeat( '</ol>', level - coalesce( lead(level) over ( ORDER BY priority_path ), -1 ) )
FROM c
ORDER BY priority_path;

Which returns HTML that renders like this:

  1. Christianity
    1. Catholicism
      1. Eastern_Rites
        1. Maronite
    2. Anabaptist
      1. Mennonites
    3. Anglican
    4. Baptist
    5. Christian_and_Missionary_Alliance
    6. Christian_Church
      1. Disciples_of_Christ
    7. Churches_of_God
      1. Church_of_God_in_Christ
      2. Worldwide_Church_of_God
    8. Congregational
    9. Evangelical_Free_Church_of_America
    10. Lutheran
    11. Methodist
      1. African_Methodist_Episcopal
      2. United_Methodist
    12. Orthodox
    13. Pentecostalism
      1. Assemblies_of_God
    14. Presbyterian
    15. Reformed_Church
    16. Religious_Society_of_Friends
    17. Seventh-day_Adventists
    18. United_Church_of_Christ

So, I think that's about it. If something is not clear – please let me know. I will try to answer all questions/problems, aside from the final voodoo magic, which I will not explain – it's voodoo after all.

  1. 3 comments

  2. # pg_lizard
    Dec 19, 2011

    Brilliant post yet again Depesz. :) Will always use these examples to demonstrate Recursive CTE’s. Incidentally the one who asked these examples on IRC was me! Thanks so much for this!

  3. # Tiziano
    Dec 19, 2011

    very good examples, i liked’em truly :D and i’ll exercize them asap. I appreciate your efforts very much. thx!

  4. # diego
    Apr 7, 2012

    Thank you, Depesz! You write exceptionally useful stuff.

Leave a comment