During last month or so, Tom Lane commited changes in PostgreSQL, which were foundations for adding hash-based versions of popular features.
I already described first such feature – DISTINCT.
Now, there were 3 more commits which were related to this:
All 3 were committed by Tom, on 7th of August. Commit messages:
Teach the system how to use hashing for UNION. (INTERSECT/EXCEPT will follow,
but seem like a separate patch since most of the remaining work is on the
executor side.) I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs. Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.
As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
Support hashing for duplicate-elimination in INTERSECT and EXCEPT queries.
This completes my project of improving usage of hashing for duplicate
elimination (aggregate functions with DISTINCT remain undone, but that's
for some other day).
As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes
that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER
BY are no longer certain to produce sorted output.
Improve INTERSECT/EXCEPT hashing by realizing that we don't need to make any
hashtable entries for tuples that are found only in the second input: they
can never contribute to the output. Furthermore, this implies that the
planner should endeavor to put first the smaller (in number of groups) input
relation for an INTERSECT. Implement that, and upgrade prepunion's estimation
of the number of rows returned by setops so that there's some amount of sanity
in the estimate of which one is smaller.
What it means for average user? Basically union, intersect and except can now work faster.
Since the changes are quite obvious I'll skip specific examples and plans of execution, but I would like to use this space to send personal big THANK YOU to Tom Lane for these changes, as I use set operators a lot, and speedup of them is very welcome