Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

On 3rd of February 2022, Peter Eisentraut committed patch:

Add UNIQUE null treatment option 
 
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
 
This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
 
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
 
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
 
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com

Continue reading Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

  • station – 170 distinct values
  • channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    DATA
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Getting list of unique elements in table, per group

Today, on irc, someone asked interesting question.

Basically she ran a query like:

SELECT a, b, c, d, e, f FROM TABLE ORDER BY a

then, she processed the query to get, for each a array of unique values of b, c, d, e, and f, and then he inserted it back to database, to some other table.

It was a problem, because the table had many rows (millions I would assume), and the whole process was slow.

So, how to make it faster?

Continue reading Getting list of unique elements in table, per group

Getting count of distinct elements, per group, in PostgreSQL.

So, couple of days ago, some guy, from Periscope company wrote a blogpost about getting number of distinct elements, per group, faster using subqueries.

This was then submitted to Hacker News and r/Programming on Reddit.

Then, the original authors submitted second blogpost comparing speed between four different DB engines. Which, in turn, was also commented on Reddit.

I found the numbers presented by Periscope (as their improvement) as not that great.

Unfortunately – their blog doesn't allow for comments, so I decided to test it, and write on my own blog, what I can find about it.

Continue reading Getting count of distinct elements, per group, in PostgreSQL.

Getting unique elements

Let's assume you have some simple database with “articles" – each article can be in many “categories". And now you want to get list of all articles in given set of categories.

Standard approach:

SELECT
    a.*
FROM
    articles AS a
    JOIN articles_in_categories AS aic ON a.id = aic.article_id
WHERE
    aic.category_id IN (14,62,70,53,138)

Will return duplicated article data if given article is in more than one from listed categories. How to remove redundant rows?

Continue reading Getting unique elements

Waiting for 8.5 – hinting for number of distinct values

Robert Haas wrote, and Tom Lane committed (on 2nd of August) patch which adds interesting capability:

Log Message:
-----------
ADD ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT
 
Robert Haas

Log message is rather terse, so what does it exactly do?

Continue reading Waiting for 8.5 – hinting for number of distinct values