tutorial – select * from depesz;

Polish configuration for TSearch

Some time ago someone posted on Reddit that they have problems adding Polish configuration to their PostgreSQL.

While checking it, I found some interesting news. And of course figured out how to have Polish configuration…

Continue reading Polish configuration for TSearch

What is index overhead on writes?

One of things people learn is that adding indexes isn't free. All write operations (insert, update, delete) will be slower – well, they have to update index.

But realistically – how much slower?

Full tests should involve lots of operations, on realistic data, but I just wanted to see some basic info. So I figured I'll just look at speed of inserting data (well, COPYing data), and will try to extract some knowledge from it…

Continue reading What is index overhead on writes?

Who logged to system from multiple countries in 2 hours?

Yesterday someone posted a set of queries for interviews, all centered on answering business-like questions from database.

Today this post is hidden behind some “subscribe to read more" thing, so I will not even link it, but one question there caught my eye.

Since I can't copy paste the text, I'll try to write what I remember:

Given table sessions, with columns: user_id, login_time, and country_id, list all cases where single account logged to the system from more than one country within 2 hour time frame.

The idea behind is that it would be a tool to find hacked account, based on idea that you generally can't change country within 2 hours. Which is somewhat true.

Solution in the blogpost suggested joining sessions table with itself, using some inequality condition. I think we can do better…

Continue reading Who logged to system from multiple countries in 2 hours?

How to ALTER tables without breaking application?

I once wrote about this problem, but given that we now have DO blocks and procedures I can make nicer, easy to use example.

Continue reading How to ALTER tables without breaking application?

SQL best practices – don’t compare count(*) with 0

Every now and then I see something like this:

select u.* from users u
where 0 = (select count(*) from addresses a where a.user_id = u.id);

and it kinda pains me. So figured, I'll write about what is the problem with it, and how to avoid such constructs.

Continue reading SQL best practices – don't compare count(*) with 0

How can I send mail or HTTP request from database?

This question happens every now and then in one of PostgreSQL support places.

Whenever it happens, I just suggest to not try, as interacting with outside world from database can be problematic, and instead use LISTEN/NOTIFY.

But it occurred to me, that while I know how to do it, I don't think I actually did it. It being: handle listen/notify in real life code. So let's try. I will not be writing actual email sending or http requesting, but will make sure that my program will get, from database, information when NOTIFY happens. How hard could it be?

Continue reading How can I send mail or HTTP request from database?

Grouping data into array of sums – fun with custom aggregates

Was asked recently about optimization of interesting case. There was table like:

=$ CREATE TABLE input_data (
    category_id        INT8,
    object_id          INT8,
    interaction_ts     timestamptz,
    interaction_type    TEXT,
    interaction_count  INT4
);

And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.

Basically, storing somewhere result of:

=$ select
    category_id,
    object_id,
    date_trunc( 'hour', interaction_ts ) as ts,
    sum(interaction_count) filter (where interaction_type = 'a') as a_count,
    sum(interaction_count) filter (where interaction_type = 'b') as b_count
from
    input_data
group by 1, 2, 3;

While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:

=$ create table results (
    category_id      int8,
    object_id        int8
    interaction_day  date,
    a_counts         int4[],
    b_counts         int4[]
);

Where a_counts, and b_counts would always have 24 elements, one for each hour.

Now, how to roll it up like this?

Continue reading Grouping data into array of sums – fun with custom aggregates

Case study: optimization of weirdly picked bad plan

We recently hit an interesting case where planner picked wrong plan. Kinda. And figured it could be used to show how to deal with such cases.

Continue reading Case study: optimization of weirdly picked bad plan

What tables were touched within given range of wal LSN?

We had a case recently where one of our DBs failed over to a new primary. To this server (old primary, and then new primary) we had connection from some kind of CDC tool (Debezium, I think).

The thing is that while there was failover, this tool read (using logical decoding) changes on old primary to location 1F1F/4ADC3348, and on new db, it connected from location 1F1F/4ADC4038. Which means that there was some WAL that was not decoded and sent to Debezium.

This is fixable, but we need to know what tables were touched in the WAL stream, that Debezium didn't see, if any. Perhaps we could also get ids/location of the rows that were there?

Continue reading What tables were touched within given range of wal LSN?

What the hell is transaction wraparound?

Recently someone asked on Slack about what is transaction wraparound. Full answer is a bit too much for slack reply, but I can try to explain it in here.

Continue reading What the hell is transaction wraparound?

=$
|

Tag: tutorial