distinct – select * from depesz;

Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

On 3rd of February 2022, Peter Eisentraut committed patch:

Add UNIQUE null treatment option 
 
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
 
This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
 
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
 
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
 
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com

Continue reading Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

Using recursive queries to get distinct elements from table

I wrote about similar things couple of times, but recently found thread on pgsql-general mailing list that made me thing about it again.

Summary of the problem from mail is: we have a table, ~ 800 million rows, with, at least 2 columns:

station – 170 distinct values
channel – generally 1-3 channels per station

And then we want to run:

SELECT
    station,
    array_agg(DISTINCT (channel)) AS channels
FROM
    data
GROUP BY
    station;

Which, on Israel's (original poster) machine took ~ 5 minutes.

And this is with index on on data (station, channel).

Can we do better?

Continue reading Using recursive queries to get distinct elements from table

Getting list of unique elements in table, per group

Today, on irc, someone asked interesting question.

Basically she ran a query like:

select a, b, c, d, e, f from table order by a

then, she processed the query to get, for each a array of unique values of b, c, d, e, and f, and then he inserted it back to database, to some other table.

It was a problem, because the table had many rows (millions I would assume), and the whole process was slow.

So, how to make it faster?

Continue reading Getting list of unique elements in table, per group

Getting count of distinct elements, per group, in PostgreSQL.

So, couple of days ago, some guy, from Periscope company wrote a blogpost about getting number of distinct elements, per group, faster using subqueries.

This was then submitted to Hacker News and r/Programming on Reddit.

Then, the original authors submitted second blogpost comparing speed between four different DB engines. Which, in turn, was also commented on Reddit.

I found the numbers presented by Periscope (as their improvement) as not that great.

Unfortunately – their blog doesn't allow for comments, so I decided to test it, and write on my own blog, what I can find about it.

Continue reading Getting count of distinct elements, per group, in PostgreSQL.

Waiting for 8.5 – hinting for number of distinct values

Robert Haas wrote, and Tom Lane committed (on 2nd of August) patch which adds interesting capability:

Log Message:
-----------
Add ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT
 
Robert Haas

Log message is rather terse, so what does it exactly do?

Continue reading Waiting for 8.5 – hinting for number of distinct values

Getting list of unique elements

Every so often you need to get list of unique elements in some column. The standard way to do it is:

select distinct column from table;

select column from table group by column;

The only problem is that it's slow – as it has to seq scan whole table. Can it be done faster?

Continue reading Getting list of unique elements

Waiting for 8.4 – hash based DISTINCT

Today Tom Lane committed patch which gives DISTINCT ability to use hash aggregate – just like GROUP BY.

Continue reading Waiting for 8.4 – hash based DISTINCT

=$
|

Tag: distinct