On 12nd of March 2026, Robert Haas committed patch:
Add pg_plan_advice contrib module. Provide a facility that (1) can be used to stabilize certain plan choices so that the planner cannot reverse course without authorization and (2) can be used by knowledgeable users to insist on plan choices contrary to what the planner believes best. In both cases, terrible outcomes are possible: users should think twice and perhaps three times before constraining the planner's ability to do as it thinks best; nevertheless, there are problems that are much more easily solved with these facilities than without them. This patch takes the approach of analyzing a finished plan to produce textual output, which we call "plan advice", that describes key decisions made during plan; if that plan advice is provided during future planning cycles, it will force those key decisions to be made in the same way. Not all planner decisions can be controlled using advice; for example, decisions about how to perform aggregation are currently out of scope, as is choice of sort order. Plan advice can also be edited by the user, or even written from scratch in simple cases, making it possible to generate outcomes that the planner would not have produced. Partial advice can be provided to control some planner outcomes but not others. Currently, plan advice is focused only on specific outcomes, such as the choice to use a sequential scan for a particular relation, and not on estimates that might contribute to those outcomes, such as a possibly-incorrect selectivity estimate. While it would be useful to users to be able to provide plan advice that affects selectivity estimates or other aspects of costing, that is out of scope for this commit. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Dian Fay <di@nmfay.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
We got it. Amazing, new, shiny, powerful, foot-gun. I don't doubt that it will get abused and cause harm. But it is really cool. So let's try to see what it does, why, and why it's somewhat controversial.
Let's start with some background and history.
When presented with a query, PostgreSQL parses it, and tries to figure out how to run it. Should it use seq scan? Index Scan? If joining, what kind of join? There is large number of ways things can work, and sometimes one way will be better than another.
Once it will get picked (the way to run), you can view it using explain select … – without analyze. It will tell you what was picked.
Some databases allow developers to influence how planner will pick what to do. The idea is that “we, the people, know better". Which kinda makes sense. As PostgreSQL operates based on set of statistics of data, which can be (and generally speaking, for non-trivially sized dbs, aren't) not fully realistic. There are some fluctuations, new data that hasn't been included in stats, or perhaps uneven distribution of random bits of table that was scanned to get the stats.
Long story short, some people, for various reasons, have been saying that PostgreSQL should have “hints". As in: a ways to tell planner: do it the way I want.
PostgreSQL devs were against the idea, and there is (was?) wiki page that sums the discussion.
The part of the wiki that is under headline Problems with existing Hint systems is why I believe hints will get abused and lead to problems. But this is always the case with powerful tools, so it's not really problems of tools themselves.
Anyway, lets see how it works, and what we can do about it.
First things first, we need to load the extension. There are many ways to do it, but the most sensible (in my opinion) is to globally load it on PostgreSQL start, so it will be available for every connection. So, to do it:
=$ show shared_preload_libraries ; shared_preload_libraries ────────────────────────── pg_stat_statements (1 row) =$ alter system set shared_preload_libraries = pg_stat_statements,pg_plan_advice; ALTER SYSTEM
This is set, but shared_preload_libraries is set only on Pg start. So I need to restart PostgreSQL too. Fine. Did that, and after restart:
=$ show shared_preload_libraries ; shared_preload_libraries ──────────────────────────────────── pg_stat_statements, pg_plan_advice (1 row)
Now, I need a playground. Let's use database for explain.depesz.com. We'll try to figure out which users of the site registered most plans, and longest plans. Simple query like:
=$ SELECT hashtext(u.username) as user_hash, u.registered, count(*), sum( length( p.plan ) ) FROM the_plans AS p JOIN users AS u ON p.added_by = u.username GROUP BY u.username order by count desc limit 10; user_hash | registered | count | sum -------------+-------------------------------+-------+---------- -820112794 | 2019-09-09 21:02:28.119044+02 | 1857 | 1648685 -1610592128 | 2021-09-22 12:10:33.202108+02 | 1539 | 68915577 1759165525 | 2019-09-09 20:58:17.084447+02 | 1297 | 1115412 -1754420189 | 2022-11-16 17:58:51.302857+01 | 1033 | 10576950 1671716498 | 2013-10-30 12:53:32.711263+01 | 999 | 12220126 1350205950 | 2019-01-28 16:17:53.232605+01 | 342 | 746589 650117286 | 2019-04-16 08:15:59.106962+02 | 342 | 10483499 -1069809673 | 2019-09-24 12:14:59.798522+02 | 330 | 798475 -2123598211 | 2016-01-19 17:46:34.644856+01 | 312 | 815365 -1849353399 | 2019-03-04 11:59:09.280873+01 | 282 | 6714 (10 rows)
Plan for this query is:
=$ EXPLAIN SELECT hashtext(u.username) as user_hash, u.registered, count(*), sum( length( p.plan ) ) FROM the_plans AS p JOIN users AS u ON p.added_by = u.username GROUP BY u.username order by count desc limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13968.50..13968.52 rows=8 width=41) -> Sort (cost=13968.50..13968.52 rows=8 width=41) Sort Key: (count(*)) DESC -> Finalize GroupAggregate (cost=3.44..13968.38 rows=8 width=41) Group Key: u.username -> Merge Join (cost=3.44..13968.22 rows=8 width=37) Merge Cond: (p.added_by = u.username) -> Partial GroupAggregate (cost=0.43..752513.11 rows=440 width=27) Group Key: p.added_by -> Index Scan using pl_x_added_by_idx on the_plans p (cost=0.43..733554.90 rows=1895381 width=593) -> Index Scan using users_pkey on users u (cost=0.28..279.64 rows=2400 width=21) (11 rows)
Sweet. So how about changing the plan?
Before we can actually change it, we need to see current state of “advice". We can do it by including special option to explain:
=$ EXPLAIN (plan_advice) SELECT hashtext(u.username) as user_hash, … limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- … Generated Plan Advice: JOIN_ORDER(p u) MERGE_JOIN_PLAIN(u) INDEX_SCAN(p public.pl_x_added_by_idx u public.users_pkey) NO_GATHER(p u) (16 rows)
This is fine. We can see that:
- PostgreSQL will use p as base table, and will join u to it.
- PostgreSQL will do the join using merge join, which requires data on both sides to be sorted using join key
- both tables should be scanned using INDEX SCAN, using specified indexes
- there is no parallel processing
Sweet. Let's start with forcing Pg to join the tables in the other direction:
=$ SET pg_plan_advice.advice = 'JOIN_ORDER(u p)'; =$ EXPLAIN SELECT hashtext(u.username) as user_hash, … limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13968.52..13968.54 rows=8 width=41) -> Sort (cost=13968.52..13968.54 rows=8 width=41) Sort Key: (count(*)) DESC -> Finalize GroupAggregate (cost=3.44..13968.40 rows=8 width=41) Group Key: u.username -> Merge Join (cost=3.44..13968.24 rows=8 width=37) Merge Cond: (u.username = p.added_by) -> Index Scan using users_pkey on users u (cost=0.28..279.64 rows=2400 width=21) -> Materialize (cost=0.43..752514.21 rows=440 width=27) -> Partial GroupAggregate (cost=0.43..752513.11 rows=440 width=27) Group Key: p.added_by -> Index Scan using pl_x_added_by_idx on the_plans p (cost=0.43..733554.90 rows=1895381 width=593) Supplied Plan Advice: JOIN_ORDER(u p) /* matched */ (14 rows)
OK. The plan has changed. Can we see all plan advice for it? Sure
=$ SET pg_plan_advice.advice = 'JOIN_ORDER(u p)'; SET =$ explain (plan_advice) SELECT hashtext(u.username) as user_hash, … limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=13968.52..13968.54 rows=8 width=41) … -> Index Scan using pl_x_added_by_idx on the_plans p (cost=0.43..733554.90 rows=1895381 width=593) Supplied Plan Advice: JOIN_ORDER(u p) /* matched */ Generated Plan Advice: JOIN_ORDER(u p) MERGE_JOIN_MATERIALIZE(p) INDEX_SCAN(u public.users_pkey p public.pl_x_added_by_idx) NO_GATHER(p u) (19 rows)
Sweet. Can we brake it more? How about forcing the join to be nested loop. After all, what could possibly go wrong 🙂
=$ SET pg_plan_advice.advice = 'JOIN_ORDER(u p) NESTED_LOOP_PLAIN(p)'; =$ EXPLAIN (plan_advice) SELECT hashtext(u.username) as user_hash, … limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Limit (cost=507326.27..507326.29 rows=10 width=41) -> Sort (cost=507326.27..507332.27 rows=2400 width=41) Sort Key: (count(*)) DESC -> GroupAggregate (cost=0.71..507274.40 rows=2400 width=41) Group Key: u.username -> Nested Loop (cost=0.71..506890.60 rows=35380 width=603) -> Index Scan using users_pkey on users u (cost=0.28..279.64 rows=2400 width=21) -> Index Scan using pl_x_added_by_idx on the_plans p (cost=0.43..210.29 rows=80 width=593) Index Cond: (added_by = u.username) JIT: Functions: 9 Options: Inlining true, Optimization true, Expressions true, Deforming true Supplied Plan Advice: JOIN_ORDER(u p) /* matched */ NESTED_LOOP_PLAIN(p) /* matched */ Generated Plan Advice: JOIN_ORDER(u p) NESTED_LOOP_PLAIN(p) INDEX_SCAN(u public.users_pkey p public.pl_x_added_by_idx) NO_GATHER(p u) (20 rows)
Finally, let's assume I don't want to use index on the_plans – maybe I suspect it's broken, and I want, instead, to use seq scan (which would be terrible for performance in here, as the seq scan would be repeated for each user. And to make it even funnier, let's make sure Pg will use parallel seq scan for this table:
=$ SET pg_plan_advice.advice = 'JOIN_ORDER(u p) NESTED_LOOP_PLAIN(p) SEQ_SCAN(p) GATHER(p)'; =$ EXPLAIN (plan_advice) SELECT hashtext(u.username) as user_hash, … limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=940460033.92..940460033.94 rows=8 width=41) -> Sort (cost=940460033.92..940460033.94 rows=8 width=41) Sort Key: (count(*)) DESC -> Finalize GroupAggregate (cost=391848.61..940460033.80 rows=8 width=41) Group Key: u.username -> Nested Loop (cost=391848.61..940460033.64 rows=8 width=37) Join Filter: (u.username = p.added_by) -> Index Scan using users_pkey on users u (cost=0.28..279.64 rows=2400 width=21) -> Partial HashAggregate (cost=391848.33..391852.73 rows=440 width=27) Group Key: p.added_by -> Gather (cost=1000.00..372894.52 rows=1895381 width=593) Workers Planned: 2 -> Parallel Seq Scan on the_plans p (cost=0.00..182356.42 rows=789742 width=593) JIT: Functions: 12 Options: Inlining true, Optimization true, Expressions true, Deforming true Supplied Plan Advice: SEQ_SCAN(p) /* matched */ JOIN_ORDER(u p) /* matched */ NESTED_LOOP_PLAIN(p) /* matched */ GATHER(p) /* matched */ Generated Plan Advice: JOIN_ORDER(u p) NESTED_LOOP_PLAIN(p) SEQ_SCAN(p) INDEX_SCAN(u public.users_pkey) GATHER(p) NO_GATHER(u) (28 rows)
I'm kinda tempted to see how slow it would be, but I don't think I have that much time 🙂
Anyway – thanks to this you can force Pg to run the query how You want. Whether it makes sense, or not 🙂
So, where would I use it?
Pg can have problems picking correct solution if one joins multiple tables, and uses conditions/ordering on fields from different tables. Think about something like this:
select *
from t1
join t2 on (t1.x = t2.x)
join t3 on (t2.y = t3.y)
where
t1.a = ‘a' and
t2.b > now() – ‘7 days'::interval
order by
t3.c desc;
Depending on various factors, including, but not limiting to, size of tables, data distribution, and random sample luck you might get plan that works great, or one that, well, doesn't.
In such case you could make planner pick the plan that you know is correct even if statistics about data don't support it.
Why do I think it's a foot-gun, then?
Well, people generally (me, for example) are lazy. Once you will put set pg_plan_advice.advice = ‘…', you will forget it. And it will be there basically forever, even if data distribution will change, or new Pg will come with even better idea.
And since, as I've shown, it's perfectly possible to give planner bad advice – well, you can imagine what will happen should something like this happen in production environment.
Anyway, it is very powerful tool, and if/when we will get it's counterpart (pg_stash_advice) it will become even more powerful, and somewhat easier to use. Will write about it, when it will happen 🙂
That was, and is, amazing work. I know that for some people it will be invaluable. Thanks to everyone involved in making this a reality.