How to get a row, and all of it’s dependencies?

This question was asked at least twice on some support channel. Getting a row is trivial: select * from table where id = ?. But what about dependencies – the rows that this exported row references?

Decided to take a look at this task.

First of all, returning data from/about multiple tables will be tricky. Single query can return just one data structure, and we need to export rows from many different tables.

Luckily, we can do it using JSON.

Let's try. To make sure that the results are easily viewable let's make some test, custom schema, that we'll use:

=$ CREATE TABLE users (
    id INT8 PRIMARY KEY,
    username TEXT NOT NULL UNIQUE,
    referred_by INT8 REFERENCES users (id)
);
=$ CREATE TABLE categories (
    id INT8 PRIMARY KEY,
    parent_id INT8 REFERENCES categories (id),
    category_name TEXT
);
=$ CREATE TABLE objects (
    id INT8 PRIMARY KEY,
    created_by INT8 NOT NULL REFERENCES users (id),
    parent_id INT8 REFERENCES objects (id),
    category_id INT8 NOT NULL REFERENCES categories (id),
    payload TEXT
);
=$ copy users FROM stdin WITH ( DELIMITER ',' );
1,depesz,\N
2,other_user,1
\.
=$ copy categories FROM stdin WITH ( DELIMITER ',' );
1,\N,top_1
2,\N,top_2
3,1,top_1/s1
4,3,top_1/s1/s2
\.
=$ copy objects FROM stdin WITH ( DELIMITER ',' );
1,1,\N,2,object-1
2,2,1,4,object-2
\.

So, let's first export object 1.

Since The logic for exporting will be rather involved, I will make it as function that does whatever is necessary, and returns JSONB with all the data.

First version of the function can be as simple as:

CREATE OR REPLACE FUNCTION export_row( IN p_source_table regclass, IN p_pkey ANYELEMENT) RETURNS JSONB AS $$
DECLARE
    v_meta record;
    v_output jsonb;
BEGIN
    -- Get metainformation about the TABLE
    SELECT
        n.nspname,
        c.relname,
        a.attname
    INTO v_meta
    FROM
        pg_class AS c
        JOIN pg_namespace AS n ON c.relnamespace = n.oid
        JOIN pg_index AS i ON i.indrelid = c.oid
        JOIN pg_attribute AS a ON a.attrelid = c.oid AND
        a.attnum = i.indkey[0]
    WHERE
        c.oid = p_source_table AND
        i.indisunique AND
        array_length(
            i.indkey,
            1
        ) = 1 AND
        c.relkind = 'r';
    IF NOT FOUND THEN
        RETURN NULL;
    END IF;
 
    EXECUTE format('SELECT to_jsonb(o) FROM %I.%I o WHERE o.%I = %L', v_meta.nspname, v_meta.relname, v_meta.attname, p_pkey ) INTO v_output;
    RETURN v_output;
END;
$$ LANGUAGE plpgsql;
SELECT export_row('objects', 1);
                                       export_row                                       
----------------------------------------------------------------------------------------
 {"id": 1, "payload": "object-1", "parent_id": NULL, "created_by": 1, "category_id": 2}
(1 ROW)

The metainformation part is needed because table can be schema-qualified, or it's name can contain spaces, and, on top of it, we need to know what is the primary key of the table, so we can build first query.

The problem now, though, is that looking at this data, I know I have to add category info, and user info, and potentially parent_object, but if I'll build recursive structure to gather all the data, it will be complicated to avoid cycles.

So, perhaps, change how the data will be returned. Let's make it so that it will be JSON object with keys:

  • source_table – full name of the table that we started from
  • source_id – id of the row that we're starting from
  • data – hash with data for all the tables, keys in it will be full table name, and as value will be hash, which will contain rows under keys based on id

So, the data for this one row should look like:

{
    "source_table": "public.objects",
    "source_id": 1,
    "data": {
        "public.objects": {
            1: {"id": 1, "payload": "object-1", "parent_id": null, "created_by": 1, "category_id": 2}
        }
    }
}

Before I'll write new export_data() function, I'll make helper function to get metadata about table, including data about foreign keys:

=$ CREATE OR REPLACE FUNCTION table_metadata( IN p_source_table regclass ) RETURNS JSONB AS $$
DECLARE
    v_output JSONB; -- Full output from this function. It will be jsonb object, with keys: oid, nspname, relname, attname, full_name, and optionally fkeys'
    v_fkeys JSONB; -- Information about foreign keys that given table has defined. It will be array of objects, where each object will contain: source_column, dest_rel (oid of referenced table), and dest_column (name of referenced column).
BEGIN
    -- Get metainformation about the table
    WITH x AS (
        SELECT
            c.oid,       -- oid of the table
            n.nspname,   -- name of schema the table is in
            c.relname,   -- name of the table
            a.attname,   -- name of column that is primary key
            format('%I.%I', n.nspname, c.relname) AS full_name
        FROM
            pg_class AS c
            JOIN pg_namespace AS n ON c.relnamespace = n.oid
            JOIN pg_index AS i ON i.indrelid = c.oid
            JOIN pg_attribute AS a ON a.attrelid = c.oid AND
            a.attnum = i.indkey[0]
        WHERE
            c.oid = p_source_table AND
            i.indisunique AND
            array_length( i.indkey, 1) = 1 AND
            c.relkind = 'r'
    )
    SELECT to_jsonb(x) INTO v_output FROM x;
 
    -- If the data was not found, there is not much we can do.
    IF NOT FOUND THEN
        RETURN NULL;
    END IF;
 
    -- Get information about foreign keys
    WITH DATA AS (
        SELECT
            s_a.attname AS source_column,
            r_c.oid AS dest_rel,
            r_a.attname AS dest_column
        FROM
            pg_catalog.pg_constraint AS c
            JOIN pg_attribute AS s_a ON c.conrelid = s_a.attrelid AND
            s_a.attnum = c.conkey[1]
            JOIN pg_class AS r_c ON c.confrelid = r_c.oid
            JOIN pg_namespace AS r_n ON r_c.relnamespace = r_n.oid
            JOIN pg_attribute AS r_a ON r_a.attrelid = c.confrelid AND
            r_a.attnum = c.confkey[1]
        WHERE
            c.conrelid = p_source_table AND
            c.contype = 'f' AND
            array_length(
                c.conkey,
                1
            ) = 1 AND
            array_length(
                c.confkey,
                1
            ) = 1
    )
    SELECT jsonb_agg( to_jsonb(d) ) INTO v_fkeys FROM DATA d;
 
    -- if there are no fkeys, return output as it was built earlier.
    IF v_fkeys IS NULL THEN
        RETURN v_output;
    END IF;
 
    -- Fkeys are there, add them to output, and return.
    RETURN jsonb_insert( v_output, array[ 'fkeys' ], v_fkeys );
END;
$$ LANGUAGE plpgsql;
 
=$ SELECT jsonb_pretty( table_metadata('objects') );
                jsonb_pretty                
--------------------------------------------
 {                                         +
     "oid": "405471",                      +
     "fkeys": [                            +
         {                                 +
             "dest_rel": "405445",         +
             "dest_column": "id",          +
             "source_column": "created_by" +
         },                                +
         {                                 +
             "dest_rel": "405471",         +
             "dest_column": "id",          +
             "source_column": "parent_id"  +
         },                                +
         {                                 +
             "dest_rel": "405459",         +
             "dest_column": "id",          +
             "source_column": "category_id"+
         }                                 +
     ],                                    +
     "attname": "id",                      +
     "nspname": "public",                  +
     "relname": "objects",                 +
     "full_name": "public.objects"         +
 }
(1 ROW)

Sweet, with this in place, I can write second helper – one that will get data about a row, and what else has to be fetched:

=$ CREATE OR REPLACE FUNCTION get_row_data( IN p_meta JSONB, IN p_column TEXT, IN p_pkey ANYELEMENT) RETURNS JSONB AS $$
DECLARE
    v_output jsonb; -- Full output from function
    v_row jsonb;    -- Found row, as JSON.
    v_fkey jsonb;   -- Helper variable to iterate over, fkeys in the TABLE.
BEGIN
    -- Fetch the row.
    EXECUTE format('SELECT to_jsonb(o) FROM %I.%I o WHERE o.%I = %L', p_meta->>'nspname', p_meta->>'relname', p_meta->>'attname', p_pkey ) INTO v_row;
    v_output := jsonb_build_object( 'row', v_row );
 
    -- If the table doesn't have fkeys, there is nothing else to process, so return early.
    IF p_meta->'fkeys' IS NULL
    THEN
        RETURN v_output;
    END IF;
 
    -- At this moment we know that there are fkeys, so let's add dependencies key, so that jsonb_insert will not complain.
    v_output := v_output || jsonb_build_object( 'dependencies', '[]'::jsonb );
 
    -- Iterate over all fkeys in the TABLE
    FOR v_fkey IN SELECT e.v FROM jsonb_array_elements(p_meta->'fkeys') e (v) LOOP
 
        -- If value in the fkey column is null, there is nothing to add from other table
        continue WHEN v_row->>(v_fkey->>'source_column') IS NULL;
 
        -- If thereis value, add it to dependencies
        v_output := jsonb_insert(
            v_output,
            '{dependencies,0}',
            jsonb_build_object(
                'rel', v_fkey->>'dest_rel',
                'column', v_fkey->>'dest_column',
                'value', v_row->(v_fkey->>'source_column')
            )
        );
    END LOOP;
    RETURN v_output;
END;
$$ LANGUAGE plpgsql;
 
=$ SELECT jsonb_pretty( get_row_data( table_metadata('objects'), 'id', 1 ) );
          jsonb_pretty          
--------------------------------
 {                             +
     "row": {                  +
         "id": 1,              +
         "payload": "object-1",+
         "parent_id": NULL,    +
         "created_by": 1,      +
         "category_id": 2      +
     },                        +
     "dependencies": [         +
         {                     +
             "rel": "405459",  +
             "value": 2,       +
             "column": "id"    +
         },                    +
         {                     +
             "rel": "405445",  +
             "value": 1,       +
             "column": "id"    +
         }                     +
     ]                         +
 }
(1 ROW)

And now, I can finally write new, smarter export_data function:

=$ CREATE OR REPLACE FUNCTION export_data( IN p_source_table regclass, IN p_pkey ANYELEMENT) RETURNS JSONB AS $$
DECLARE
    v_meta jsonb;                  -- metainformation about single table
    v_all_meta jsonb;              -- metainfomation about all tables that are part of export
    v_output jsonb;                -- final output from function
    v_queue jsonb;                 -- jsonb array with queue of rows to fetch
    v_new_elements jsonb := '[]';  -- list that will become new queue on next iteration
    v_row_info jsonb;              -- single element from queue
    v_row_data jsonb;              -- data about fetched row
BEGIN
    -- Get metadata about source TABLE
    v_meta := table_metadata( p_source_table );
    -- Put this metadata in metadata cache
    v_all_meta := jsonb_build_object( p_source_table::oid, v_meta );
 
    -- Make base output
    v_output := jsonb_build_object(
        'source_table', format('%I.%I', v_meta->>'nspname', v_meta->>'relname'),
        'source_id', p_pkey,
        'data', '{}'::jsonb
    );
 
    -- Add the row that user requested with options to queue. At this moment, queue will have only one element
    v_queue := jsonb_build_array(
        jsonb_build_object(
            'rel', p_source_table::oid,
            'column', v_meta->>'attname',
            'value', p_pkey
        )
    );
 
    -- Process the queue, if it's not empty.
    WHILE jsonb_array_length( v_queue ) > 0 LOOP
 
        -- Iterate over elements in queue, putting each element into v_row_info
        FOR v_row_info IN SELECT e.v FROM jsonb_array_elements( v_queue ) e(v) LOOP
 
            -- If metainformation for table for this row is not yet in v_all_meta, add it.
            IF NOT v_all_meta ? (v_row_info->>'rel') THEN
                v_all_meta := jsonb_insert( v_all_meta, array[ v_row_info->>'rel' ], table_metadata( (v_row_info->>'rel')::oid ) );
            END IF;
 
            -- Get metainfo for currently processed TABLE, to have less typing...
            v_meta := v_all_meta->(v_row_info->>'rel');
 
            -- Make sure that output/data hash has place for data from this table, otherwise jsonb_insert() of the row will fail.
            IF NOT v_output->'data' ? (v_meta->>'full_name') THEN
                v_output := jsonb_insert( v_output, array[ 'data', v_meta->>'full_name' ], '{}' );
            END IF;
 
            -- If output already contains data for this row, there is no point in processing.
            continue WHEN v_output->'data'->(v_meta->>'full_name') ? (v_row_info->>'value');
 
            -- Get row_data for this single row.
            v_row_data := get_row_data( v_meta, v_row_info->>'column', v_row_info->>'value' );
 
            -- Add row data to output
            v_output := jsonb_insert( v_output, array[ 'data', v_meta->>'full_name', v_row_info->>'value' ], v_row_data->'row' );
 
            -- If there are no dependencies, go to next element in queue
            continue WHEN v_row_data->'dependencies' IS NULL;
 
            -- Add current dependencies to v_new_elements
            v_new_elements := v_new_elements || ( v_row_data->'dependencies' );
        END loop;
 
        -- After processing queue, make sure that all the dependencies will be in queue
        v_queue := v_new_elements;
        -- Zero v_new_elements, so we can detect when there is nothing else to add.
        v_new_elements := '[]';
    END loop;
 
    -- At this moment all is built :)
    RETURN v_output;
END;
$$ LANGUAGE plpgsql;
 
=$ SELECT jsonb_pretty( export_data('objects', 1) );
               jsonb_pretty               
------------------------------------------
 {                                       +
     "data": {                           +
         "public.users": {               +
             "1": {                      +
                 "id": 1,                +
                 "username": "depesz",   +
                 "referred_by": NULL     +
             }                           +
         },                              +
         "public.objects": {             +
             "1": {                      +
                 "id": 1,                +
                 "payload": "object-1",  +
                 "parent_id": NULL,      +
                 "created_by": 1,        +
                 "category_id": 2        +
             }                           +
         },                              +
         "public.categories": {          +
             "2": {                      +
                 "id": 2,                +
                 "parent_id": NULL,      +
                 "category_name": "top_2"+
             }                           +
         }                               +
     },                                  +
     "source_id": 1,                     +
     "source_table": "public.objects"    +
 }
(1 ROW)

and if I'd want to get the object that is more connected:

=$ SELECT jsonb_pretty( export_data( 'objects', 2) );
                  jsonb_pretty                  
------------------------------------------------
 {                                             +
     "data": {                                 +
         "public.users": {                     +
             "1": {                            +
                 "id": 1,                      +
                 "username": "depesz",         +
                 "referred_by": NULL           +
             },                                +
             "2": {                            +
                 "id": 2,                      +
                 "username": "other_user",     +
                 "referred_by": 1              +
             }                                 +
         },                                    +
         "public.objects": {                   +
             "1": {                            +
                 "id": 1,                      +
                 "payload": "object-1",        +
                 "parent_id": NULL,            +
                 "created_by": 1,              +
                 "category_id": 2              +
             },                                +
             "2": {                            +
                 "id": 2,                      +
                 "payload": "object-2",        +
                 "parent_id": 1,               +
                 "created_by": 2,              +
                 "category_id": 4              +
             }                                 +
         },                                    +
         "public.categories": {                +
             "1": {                            +
                 "id": 1,                      +
                 "parent_id": NULL,            +
                 "category_name": "top_1"      +
             },                                +
             "2": {                            +
                 "id": 2,                      +
                 "parent_id": NULL,            +
                 "category_name": "top_2"      +
             },                                +
             "3": {                            +
                 "id": 3,                      +
                 "parent_id": 1,               +
                 "category_name": "top_1/s1"   +
             },                                +
             "4": {                            +
                 "id": 4,                      +
                 "parent_id": 3,               +
                 "category_name": "top_1/s1/s2"+
             }                                 +
         }                                     +
     },                                        +
     "source_id": 2,                           +
     "source_table": "public.objects"          +
 }
(1 ROW)

All in all, it works. Will break on tables without primary key, or when primary key is on more than one column. And it can follow foreign keys, as long as they are single column. But it's a start. It works, and isn't really slow 🙂

If you'd like to use it, you can download all the functions in one file.

3 thoughts on “How to get a row, and all of it’s dependencies?”

  1. Kudos. A really smart and useful script.

    Just a suggestion maybe you can add a header to the file with the license and copyright, at least for attribution.

  2. @Francisco:

    well, if i’ll get around to make it an “extension” that one can download/install easily, i might. For now – it’s public domain, you can all take it and use for whatever purpose.

  3. Fair enough @depesz. Thank you.

    Just in case someone read these comments, i want to note that Public Domain have different meanings in European and US legislation. Under European laws, Public Domain can no be exactly considered as Free Software, so you can have problems mixing public domain software with other licenses. But when the author have a clear manifestation about the freedom of the software the risk of Not Compliant is low.

    https://opensource.org/faq#public-domain

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.