During last month or so, Tom Lane commited changes in PostgreSQL, which were foundations for adding hash-based versions of popular features.
I already described first such feature – DISTINCT.
Now, there were 3 more commits which were related to this:
All 3 were committed by Tom, on 7th of August. Commit messages:
Teach the system how to use hashing for UNION. (INTERSECT/EXCEPT will follow, but seem like a separate patch since most of the remaining work is on the executor side.) I took the opportunity to push selection of the grouping operators for set operations into the parser where it belongs. Otherwise this is just a small exercise in making prepunion.c consider both alternatives. As with the recent DISTINCT patch, this means we can UNION on datatypes that can hash but not sort, and it means that UNION without ORDER BY is no longer certain to produce sorted output.
Support hashing for duplicate-elimination in INTERSECT and EXCEPT queries. This completes my project of improving usage of hashing for duplicate elimination (aggregate functions with DISTINCT remain undone, but that's for some other day). As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER BY are no longer certain to produce sorted output.
Improve INTERSECT/EXCEPT hashing by realizing that we don't need to make any hashtable entries for tuples that are found only in the second input: they can never contribute to the output. Furthermore, this implies that the planner should endeavor to put first the smaller (in number of groups) input relation for an INTERSECT. Implement that, and upgrade prepunion's estimation of the number of rows returned by setops so that there's some amount of sanity in the estimate of which one is smaller.
What it means for average user? Basically union, intersect and except can now work faster.
Since the changes are quite obvious I'll skip specific examples and plans of execution, but I would like to use this space to send personal big THANK YOU to Tom Lane for these changes, as I use set operators a lot, and speedup of them is very welcome 🙂