Waiting for 9.1 – Foreign data wrapper

Well, saying that on particular date someone committed patch, wouldn't be really telling. In fact various bits and pieces of underlying logic have been committed for a long time, but now we finally have some functionality visible and available to end users.

This became the case thanks to these two commits, both committed on 20th of February, by Tom Lane.

First:

Implement an API to let foreign-data wrappers actually be functional.
 
This commit provides the core code and documentation needed.  A contrib
module test case will follow shortly.
 
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas

and second:

Add contrib/file_fdw foreign-data wrapper for reading files via COPY.
 
This is both very useful in its own right, and an important test case
for the core FDW support.
 
This commit includes a small refactoring of copy.c to expose its option
checking code as a separately callable function.  The original patch
submission duplicated hundreds of lines of that code, which seemed pretty
unmaintainable.
 
Shigeru Hanada, reviewed by Itagaki Takahiro and Tom Lane

Before I will show you what we can do with these two patches, let me step back a bit.

For a very long time (as far as I know forever) PostgreSQL had contrib module dblink.

This made it possible to run queries from one database using objects (tables, views, functions) in another.

DBlink limitation was that both ends of link had to be PostgreSQL databases. To avoid this limit there came dbi-link.

Thanks to Perl magic, it was now possible to make the link between PostgreSQL and virtually any other database.

This worked (and works), but it was custom addition, not standardized.

In the mean time (in 2003) new SQL standard, SQL 2003, described “SQL/MED", or “Management of External Data". Which was a standardized way of handling access to remote objects in SQL databases.

Foreign data wrappers in PostgreSQL are beginning of implementation of SQL/MED in PostgreSQL (beginning, as it's not fully functional as of now – for example – only read only queries work).

How to use it?

To use FDW we need some library that actually works as FDW, as the PostgreSQL built-in part, is just API. This is great because it will be possible (and simple, as far as I understand) to write additional wrapper, that will (for example) return google results, or tweets, or status updates from facebook, or anything else you can dream of.

Currently, though, we have only one such ready wrapper, quite useful – file_fdw. This wrapper lets you query files (which should be in “COPY"-able format).

So, let's test it:

$ CREATE extension file_fdw;
CREATE EXTENSION

Now, I have all the logic ready in database.

And now I can add some foreign tables. Let's start with a simple one:

CREATE FOREIGN TABLE passwd (
    username text,
    pass text,
    uid int4,
    gid int4,
    gecos text,
    home text,
    shell text
) SERVER file_server
OPTIONS (format 'text', filename '/etc/passwd', delimiter ':', NULL '');

Syntax seems simple enough, but there are important notes. All options depend on which FDW you're using. That is there are no standard ones, and you need to look in your wrapper documentation to get information on what options are available, and how to use them.

After running above query, PostgreSQL reported happily “CREATE FOREIGN TABLE". Does that mean I can query it? Let's see:

$ SELECT * FROM passwd ORDER BY uid ASC LIMIT 3;
 username | pass | uid | gid | gecos  |   home    |   shell
----------+------+-----+-----+--------+-----------+-----------
 root     | x    |   0 |   0 | root   | /root     | /bin/bash
 daemon   | x    |   1 |   1 | daemon | /usr/sbin | /bin/sh
 bin      | x    |   2 |   2 | bin    | /bin      | /bin/sh
(3 ROWS)

Whoa. That's nice. What else can we see about it?

For starters \d output shows that it's remote table:

$ \d
            List of relations
 Schema |  Name  |     Type      | Owner
--------+--------+---------------+--------
 public | passwd | foreign table | depesz
 public | test   | table         | depesz
(2 rows)
 
$ \d passwd
 Foreign table "public.passwd"
  Column  |  Type   | Modifiers
----------+---------+-----------
 username | text    |
 pass     | text    |
 uid      | integer |
 gid      | integer |
 gecos    | text    |
 home     | text    |
 shell    | text    |
Server: file_server

When explaining query with such table, we get some additional information:

$ EXPLAIN analyze SELECT * FROM passwd ORDER BY uid ASC LIMIT 3;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 LIMIT  (cost=2.24..2.25 ROWS=3 width=168) (actual TIME=0.078..0.079 ROWS=3 loops=1)
   ->  Sort  (cost=2.24..2.27 ROWS=11 width=168) (actual TIME=0.078..0.078 ROWS=3 loops=1)
         Sort KEY: uid
         Sort Method: top-N heapsort  Memory: 25kB
         ->  FOREIGN Scan ON passwd  (cost=0.00..2.10 ROWS=11 width=168) (actual TIME=0.012..0.057 ROWS=44 loops=1)
               FOREIGN File: /etc/passwd
               FOREIGN File SIZE: 2115
 Total runtime: 0.116 ms
(8 ROWS)

All in all – I'm extremely happy about it. I definitely don't understand all of it, but even the small parts that I do make me positively nervous about playing with it in future. Now, if only one could write wrappers in some language like Perl, and not C, it would be even better 🙂

7 thoughts on “Waiting for 9.1 – Foreign data wrapper”

  1. Thanks for the write up. Can’t wait to see the docs on writing the wrappers themselves.

    A really powerful addition would be the ability to write the wrappers in a glue language like perl – but I wouldn’t be surprised to hear that this way beyond the current scope of the underlying code.

  2. I see that your query plan has estimated costs and rows for the foreign table – are statistics kept on foreign tables?

  3. Something seems to be missing in the example as I get the error message: “server “file_server” does not exist”

    How can I create the file_server?

  4. Apparently you also need to do:

    CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;

Comments are closed.