On 3rd of February 2022, Peter Eisentraut committed patch:
Add UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not. Different
implementations have different behaviors. In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
This patch adds this option to PostgreSQL. The default behavior
remains UNIQUE NULLS DISTINCT. Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
Reviewed-by: Maxim Orlov <firstname.lastname@example.org>
Reviewed-by: Pavel Borisov <email@example.com>
Continue reading Waiting for PostgreSQL 15 – Add UNIQUE null treatment option
I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.
Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:
- station – 170 distinct values
- channel – generally 1-3 channels per station
And then we want to run:
array_agg(DISTINCT (channel)) AS channels
Which, on Israel's (original poster) machine took ~ 5 minutes.
And this is with index on on data (station, channel).
Can we do better?
Continue reading Using recursive queries to get distinct elements from table
Today, on irc, someone asked interesting question.
Basically she ran a query like:
SELECT a, b, c, d, e, f FROM TABLE ORDER BY a
then, she processed the query to get, for each a array of unique values of b, c, d, e, and f, and then he inserted it back to database, to some other table.
It was a problem, because the table had many rows (millions I would assume), and the whole process was slow.
So, how to make it faster?
Continue reading Getting list of unique elements in table, per group
So, couple of days ago, some guy, from Periscope company wrote a blogpost about getting number of distinct elements, per group, faster using subqueries.
This was then submitted to Hacker News and r/Programming on Reddit.
Then, the original authors submitted second blogpost comparing speed between four different DB engines. Which, in turn, was also commented on Reddit.
I found the numbers presented by Periscope (as their improvement) as not that great.
Unfortunately – their blog doesn't allow for comments, so I decided to test it, and write on my own blog, what I can find about it.
Continue reading Getting count of distinct elements, per group, in PostgreSQL.
Let's assume you have some simple database with “articles" – each article can be in many “categories". And now you want to get list of all articles in given set of categories.
articles AS a
JOIN articles_in_categories AS aic ON a.id = aic.article_id
aic.category_id IN (14,62,70,53,138)
Will return duplicated article data if given article is in more than one from listed categories. How to remove redundant rows?
Continue reading Getting unique elements
Robert Haas wrote, and Tom Lane committed (on 2nd of August) patch which adds interesting capability:
ADD ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT
Log message is rather terse, so what does it exactly do?
Continue reading Waiting for 8.5 – hinting for number of distinct values
Every so often you need to get list of unique elements in some column. The standard way to do it is:
SELECT DISTINCT COLUMN FROM TABLE;
SELECT COLUMN FROM TABLE GROUP BY COLUMN;
The only problem is that it's slow – as it has to seq scan whole table. Can it be done faster?
Continue reading Getting list of unique elements
Today Tom Lane committed patch which gives DISTINCT ability to use hash aggregate – just like GROUP BY.
Continue reading Waiting for 8.4 – hash based DISTINCT
When I was working for one of customers we found some strange thing. We needed to found number of distinct sessions per day. Table layout was very simple:
Continue reading Counting number of distinct elements