Was asked recently about optimization of interesting case. There was table like:
=$ CREATE TABLE input_data (
category_id INT8,
object_id INT8,
interaction_ts timestamptz,
interaction_type TEXT,
interaction_count INT4
);
And there was a code that was grouping it all by sum()ing interaction_count per category, object, interaction_type, and timestamp truncated to hour.
Basically, storing somewhere result of:
=$ SELECT
category_id,
object_id,
date_trunc( 'hour', interaction_ts ) AS ts,
SUM(interaction_count) FILTER (WHERE interaction_type = 'a') AS a_count,
SUM(interaction_count) FILTER (WHERE interaction_type = 'b') AS b_count
FROM
input_data
GROUP BY 1, 2, 3;
While talking about optimizations, one idea that came was to store whole day of counts in single row, as array. So the resulting count table would be:
=$ CREATE TABLE results (
category_id int8,
object_id int8
interaction_day DATE,
a_counts int4[],
b_counts int4[]
);
Where a_counts, and b_counts would always have 24 elements, one for each hour.
Now, how to roll it up like this?
Continue reading Grouping data into array of sums – fun with custom aggregates