March 22nd, 2016 by depesz | Tags: , , , , , | 7 comments »
Did it help? If yes - maybe you can help me?

On 18th of March, Robert Haas committed patch:

Directly modify foreign tables.
 
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.
 
Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.

The meaning of this should be pretty obvious, but let's quickly see some test of how it works. First, on some Pg 9.4 I have:

$ \d test
      Table "public.test"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id      | integer | not null
 payload | text    |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)
 
(depesz@localhost:5432) 20:36:05 [depesz]
$ select * from test;
 id | payload
----+---------
  1 | test 1
  2 | test 2
  3 | test 3
  4 | test 4
  5 | test 5
  6 | test 6
  7 | test 7
  8 | test 8
  9 | test 9
 10 | test 10
(10 rows)

Now, in the same database, I create the foreign table, using postgres_fdw:

$ create server source foreign data wrapper postgres_fdw options( dbname 'depesz', host 'localhost' );
CREATE SERVER
 
$ create user mapping for depesz server source options ( user 'depesz' );
CREATE USER MAPPING
 
$ create foreign table remote_table (id int4 not null, payload text) server source options (table_name 'test');
CREATE FOREIGN TABLE
 
$ select * from remote_table ;
 id | payload 
----+---------
  1 | test 1
  2 | test 2
  3 | test 3
  4 | test 4
  5 | test 5
  6 | test 6
  7 | test 7
  8 | test 8
  9 | test 9
 10 | test 10
(10 rows)

Great. It works.

So, now, let's try to update the table remotely:

$ explain analyze update remote_table set payload = payload || 'x';
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Update on remote_table  (cost=100.00..150.33 rows=1241 width=42) (actual time=3.647..3.647 rows=0 loops=1)
   ->  Foreign Scan on remote_table  (cost=100.00..150.33 rows=1241 width=42) (actual time=0.677..0.709 rows=10 loops=1)
 Planning time: 0.066 ms
 Execution time: 4.135 ms
(4 rows)

In postgresql log, I can see:

LOG:  connection authorized: user=depesz database=depesz
LOG:  duration: 0.142 ms  statement: SET search_path = pg_catalog
LOG:  duration: 0.280 ms  statement: SET timezone = 'UTC'
LOG:  duration: 0.073 ms  statement: SET datestyle = ISO
LOG:  duration: 0.042 ms  statement: SET intervalstyle = postgres
LOG:  duration: 0.067 ms  statement: SET extra_float_digits = 3
LOG:  duration: 0.060 ms  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
LOG:  duration: 0.250 ms  parse <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE
LOG:  duration: 0.380 ms  bind <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE
LOG:  duration: 0.065 ms  execute <unnamed>: DECLARE c1 CURSOR FOR SELECT id, payload, ctid FROM public.test FOR UPDATE
LOG:  duration: 0.149 ms  statement: FETCH 100 FROM c1
LOG:  duration: 0.198 ms  parse pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
LOG:  duration: 0.136 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,11)', $2 = 'test 1xx'
LOG:  duration: 0.069 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,11)', $2 = 'test 1xx'
LOG:  duration: 0.091 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,12)', $2 = 'test 2xx'
LOG:  duration: 0.044 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,12)', $2 = 'test 2xx'
LOG:  duration: 0.061 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,13)', $2 = 'test 3xx'
LOG:  duration: 0.040 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,13)', $2 = 'test 3xx'
LOG:  duration: 0.071 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,14)', $2 = 'test 4xx'
LOG:  duration: 0.059 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,14)', $2 = 'test 4xx'
LOG:  duration: 0.074 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,15)', $2 = 'test 5xx'
LOG:  duration: 0.043 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,15)', $2 = 'test 5xx'
LOG:  duration: 0.069 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,16)', $2 = 'test 6xx'
LOG:  duration: 0.046 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,16)', $2 = 'test 6xx'
LOG:  duration: 0.016 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,17)', $2 = 'test 7xx'
LOG:  duration: 0.040 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,17)', $2 = 'test 7xx'
LOG:  duration: 0.018 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,18)', $2 = 'test 8xx'
LOG:  duration: 0.042 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,18)', $2 = 'test 8xx'
LOG:  duration: 0.015 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,19)', $2 = 'test 9xx'
LOG:  duration: 0.047 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,19)', $2 = 'test 9xx'
LOG:  duration: 0.014 ms  bind pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,20)', $2 = 'test 10xx'
LOG:  duration: 0.041 ms  execute pgsql_fdw_prep_1: UPDATE public.test SET payload = $2 WHERE ctid = $1
DETAIL:  parameters: $1 = '(0,20)', $2 = 'test 10xx'
LOG:  duration: 0.018 ms  statement: DEALLOCATE pgsql_fdw_prep_1
LOG:  duration: 0.021 ms  statement: CLOSE c1
LOG:  duration: 73.715 ms  statement: COMMIT TRANSACTION
LOG:  duration: 93.146 ms  statement: explain analyze update remote_table set payload = payload || 'x';

That's quite a lot of things. As you can see, true to what commit message said, each row was updated separately.

In 9.6, though, it looks differently:

$ explain analyze update remote_table set payload = payload || 'x';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Update on remote_table  (cost=100.00..150.33 rows=1241 width=42) (actual time=0.408..0.408 rows=0 loops=1)
   ->  Foreign Update on remote_table  (cost=100.00..150.33 rows=1241 width=42) (actual time=0.407..0.407 rows=10 loops=1)
 Planning time: 0.120 ms
 Execution time: 2.400 ms
(4 rows)

(please note that it's on different server, so times are not really comparable.)

And logs show:

LOG:  connection authorized: user=depesz database=depesz
LOG:  duration: 0.065 ms  statement: SET search_path = pg_catalog
LOG:  duration: 0.145 ms  statement: SET timezone = 'UTC'
LOG:  duration: 0.025 ms  statement: SET datestyle = ISO
LOG:  duration: 0.011 ms  statement: SET intervalstyle = postgres
LOG:  duration: 0.018 ms  statement: SET extra_float_digits = 3
LOG:  duration: 0.014 ms  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
LOG:  duration: 0.170 ms  parse <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text)
LOG:  duration: 0.124 ms  bind <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text)
LOG:  duration: 0.050 ms  execute <unnamed>: UPDATE public.test SET payload = (payload || 'x'::text)
LOG:  duration: 25.206 ms  statement: COMMIT TRANSACTION
LOG:  duration: 28.431 ms  statement: explain analyze update remote_table set payload = payload || 'x';

Much better, isn't it? Thank you, for all involved 🙂

  1. 7 comments

  2. # puqun
    Sep 5, 2016

    postgresql9.6 added two other properties about postgres_fdw,which is sort push down and join pushdown.

    It is difficult to understand these two properties , can you give some examples to help understand?

    Please!

    Thank you very much !!!

  3. Sep 5, 2016

    @puqun:
    can you give me link to committers messages about it? or commit hashes?

  4. # puqun
    Sep 6, 2016

    https://www.postgresql.org/message-id/E1aBS4o-0002l1-Tv@gemulon.postgresql.org
    postgres_fdw: Consider requesting sorted data so we can do a merge join.

    When use_remote_estimate is enabled, consider adding ORDER BY to the
    query we sending to the remote server so that we can use that ordered
    data for a merge join.

  5. # puqun
    Sep 6, 2016

    https://www.postgresql.org/message-id/E1aTDlV-0007xw-Vj@gemulon.postgresql.org
    postgres_fdw: Push down joins to remote servers.

    If we’ve got a relatively straightforward join between two tables,
    this pushes that join down to the remote server instead of fetching
    the rows for each table and performing the join locally. Some cases
    are not handled yet, such as SEMI and ANTI joins. Also, we don’t
    yet attempt to create presorted join paths or parameterized join
    paths even though these options do get tried for a base relation
    scan. Nevertheless, this seems likely to be a very significant win
    in many practical cases.

  6. Sep 6, 2016

    @puqun:
    well, from the description it doesn’t sound overly hard to understand what’s going on.

    I’m hesitant to write “waiting for” for new optimizations for fdw cases for the same reason I generally don’t write about optimization changes in other code – it’s hard to show apples-to-apples comparison, and it doesn’t really change functionality. in case of fdw there is also the factor of it being really fringe feature, which might get more used in future, but now, I would say, is definitely not common.

  7. # youyou
    Oct 11, 2016

    In PostgreSQL9.6, when TRUE or FLASE is set to use_remote_estimate, the flow of new characteristic Push down joins to remote servers processing is different.

    *********************
    case 1:
    *********************
    1. Create table for test:
    postgres=# CREATE TABLE t0 (i int);
    CREATE TABLE
    postgres=# INSERT INTO t0 VALUES (generate_series(1, 1000));
    INSERT 0 1000
    postgres=# create database db2;
    CREATE DATABASE
    postgres=#
    postgres=# \c db2
    You are now connected to database “db2” as user “postgres96r”.
    db2=# CREATE EXTENSION postgres_fdw;
    CREATE EXTENSION
    db2=# CREATE SERVER server1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname ‘postgres’, port ‘55667’);
    CREATE SERVER
    db2=# CREATE USER MAPPING FOR public SERVER server1 OPTIONS (USER ‘postgres96r’);
    CREATE USER MAPPING
    db2=# CREATE SCHEMA remote1;
    CREATE SCHEMA
    db2=# IMPORT FOREIGN SCHEMA public FROM SERVER server1 INTO remote1;
    IMPORT FOREIGN SCHEMA
    db2=#
    db2=# SELECT count(*) FROM remote1.t0;
    count
    ——-
    1000
    (1 row)

    2.Analyze the join operation of the foreign table.
    db2=# alter server server1 options (ADD use_remote_estimate ‘true’);
    ALTER SERVER
    db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i AND a.i = 500;
    QUERY PLAN
    ——————————————————————————————————————–
    Foreign Scan (cost=100.00..135.03 rows=1 width=8) (actual time=1.068..1.069 rows=1 loops=1)
    Output: a.i, b.i
    Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
    Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r2.i = 500)) AND ((r1.i = 500))))
    Planning time: 3.654 ms
    Execution time: 1.272 ms
    (6 rows)

    db2=# alter server server1 options (ADD use_remote_estimate ‘false’);
    ERROR: option “use_remote_estimate” provided more than once
    db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i AND a.i = 500;
    QUERY PLAN
    ——————————————————————————————————————–
    Foreign Scan (cost=100.00..135.03 rows=1 width=8) (actual time=1.002..1.003 rows=1 loops=1)
    Output: a.i, b.i
    Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
    Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r2.i = 500)) AND ((r1.i = 500))))
    Planning time: 2.767 ms
    Execution time: 1.193 ms
    (6 rows)

    Thus,the setting of the parameter “use_remote_estimate” has a certain influence on the execution of the join operation.

  8. # youyou
    Oct 11, 2016

    *******
    case2
    ******
    Just change the SQL statement of join operation:
    db2=# alter server server1 options (SET use_remote_estimate ‘false’);
    ALTER SERVER
    db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i;
    QUERY PLAN
    ——————————————————————————————————————————–
    Merge Join (cost=732.29..1388.59 rows=42778 width=8) (actual time=6.878..9.135 rows=1000 loops=1)
    Output: a.i, b.i
    Merge Cond: (a.i = b.i)
    -> Sort (cost=366.15..373.46 rows=2925 width=4) (actual time=3.944..4.344 rows=1000 loops=1)
    Output: a.i
    Sort Key: a.i
    Sort Method: quicksort Memory: 71kB
    -> Foreign Scan on remote1.t0 a (cost=100.00..197.75 rows=2925 width=4) (actual time=1.037..3.171 rows=1000 loops=1)
    Output: a.i
    Remote SQL: SELECT i FROM public.t0
    -> Sort (cost=366.15..373.46 rows=2925 width=4) (actual time=2.916..3.292 rows=1000 loops=1)
    Output: b.i
    Sort Key: b.i
    Sort Method: quicksort Memory: 71kB
    -> Foreign Scan on remote1.t0 b (cost=100.00..197.75 rows=2925 width=4) (actual time=0.385..2.350 rows=1000 loops=1)
    Output: b.i
    Remote SQL: SELECT i FROM public.t0
    Planning time: 2.085 ms
    Execution time: 10.980 ms
    (19 rows)

    db2=# alter server server1 options (SET use_remote_estimate ‘true’);
    ALTER SERVER
    db2=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM remote1.t0 a JOIN remote1.t0 b ON a.i = b.i;
    QUERY PLAN
    —————————————————————————————————-
    Foreign Scan (cost=127.50..176.25 rows=1000 width=8) (actual time=1.172..3.260 rows=1000 loops=1)
    Output: a.i, b.i
    Relations: (remote1.t0 a) INNER JOIN (remote1.t0 b)
    Remote SQL: SELECT r1.i, r2.i FROM (public.t0 r1 INNER JOIN public.t0 r2 ON (((r1.i = r2.i))))
    Planning time: 5.567 ms
    Execution time: 3.668 ms
    (6 rows)

    After the change of SQL operation of the join, the settings of parameter “use_remote_estimate” has no effect on the join operation process.

    How does the parameter “use_remote_estimate” affect the operation of the join?
    Which factors are related to it?
    Please give me some guidance.

Sorry, comments for this post are disabled.