What is LATERAL, what is it for, and how can one use it?

Lately in couple of places I recommended people that they can solve their problem with queries using LATERAL. In some cases recipient of such suggestion indicated that they had no idea what LATERAL is. Which made me think that it might be good idea to write more about them (lateral queries)…

Also – I know that some of the examples I shown in here can be done differently, I just wanted to show how one can use LATERAL, and am terrible with coming up with better usecases.

Lateral queries are with us for quite some time now – specifically since Pg 9.3 – that's around 10 years.

So, what are these?

Broadly speaking – lateral subquery (sometimes also called lateral join) is a way for developer to make PostgreSQL to generate many rows based on single row of data.

Simplest example: let's assume table that contains some events as two columns (I know I could used range datatype, but I want to keep it simple): event_start and event_end. Like this:

=$ CREATE TABLE events (
    id int8 GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    event_start DATE NOT NULL,
    event_end DATE NOT NULL,
    CHECK (event_end >= event_start)
);

Now, let's add there some events:

=$ WITH event_starts AS (
    SELECT now() - '2 weeks'::INTERVAL * random() AS START
    FROM generate_series(1,5) i
)
INSERT INTO events (event_start, event_end)
SELECT
    START,
    START + '3 days'::INTERVAL + random() * '4 days'::INTERVAL
FROM
    event_starts;

This gave me some nice events I can work with:

=$ SELECT * FROM events;
 id | event_start | event_end  
----+-------------+------------
  1 | 2022-09-15  | 2022-09-22
  2 | 2022-09-06  | 2022-09-11
  3 | 2022-09-06  | 2022-09-10
  4 | 2022-09-12  | 2022-09-18
  5 | 2022-09-05  | 2022-09-10
(5 ROWS)

Now, let's assume I'd want to get list of all dates that have some events on them, with count of how many events there are on each of these days.

I can start with simple select * from events, and then use lateral to get list of all days. Let's see:

=$ SELECT
    e.*,
    l.*
FROM
    events e,
    lateral (
        SELECT x::DATE
        FROM generate_series(e.event_start, e.event_end, '1 day'::INTERVAL) AS x
    ) AS l
 id | event_start | event_end  |     x      
----+-------------+------------+------------
  1 | 2022-09-15  | 2022-09-22 | 2022-09-15
  1 | 2022-09-15  | 2022-09-22 | 2022-09-16
  1 | 2022-09-15  | 2022-09-22 | 2022-09-17
  1 | 2022-09-15  | 2022-09-22 | 2022-09-18
  1 | 2022-09-15  | 2022-09-22 | 2022-09-19
  1 | 2022-09-15  | 2022-09-22 | 2022-09-20
  1 | 2022-09-15  | 2022-09-22 | 2022-09-21
  1 | 2022-09-15  | 2022-09-22 | 2022-09-22
  2 | 2022-09-06  | 2022-09-11 | 2022-09-06
  2 | 2022-09-06  | 2022-09-11 | 2022-09-07
  2 | 2022-09-06  | 2022-09-11 | 2022-09-08
  2 | 2022-09-06  | 2022-09-11 | 2022-09-09
  2 | 2022-09-06  | 2022-09-11 | 2022-09-10
  2 | 2022-09-06  | 2022-09-11 | 2022-09-11
  3 | 2022-09-06  | 2022-09-10 | 2022-09-06
  3 | 2022-09-06  | 2022-09-10 | 2022-09-07
  3 | 2022-09-06  | 2022-09-10 | 2022-09-08
  3 | 2022-09-06  | 2022-09-10 | 2022-09-09
  3 | 2022-09-06  | 2022-09-10 | 2022-09-10
  4 | 2022-09-12  | 2022-09-18 | 2022-09-12
  4 | 2022-09-12  | 2022-09-18 | 2022-09-13
  4 | 2022-09-12  | 2022-09-18 | 2022-09-14
  4 | 2022-09-12  | 2022-09-18 | 2022-09-15
  4 | 2022-09-12  | 2022-09-18 | 2022-09-16
  4 | 2022-09-12  | 2022-09-18 | 2022-09-17
  4 | 2022-09-12  | 2022-09-18 | 2022-09-18
  5 | 2022-09-05  | 2022-09-10 | 2022-09-05
  5 | 2022-09-05  | 2022-09-10 | 2022-09-06
  5 | 2022-09-05  | 2022-09-10 | 2022-09-07
  5 | 2022-09-05  | 2022-09-10 | 2022-09-08
  5 | 2022-09-05  | 2022-09-10 | 2022-09-09
  5 | 2022-09-05  | 2022-09-10 | 2022-09-10
(32 ROWS)

Please note that data in first 3 columns is repeated – because it's the same row in events table, just the 4th column is added thanks to lateral magic.

Lateral, in this example, called generate_series() function, which generated set of timestamps, which then got cast to date datatype, so that we'll get only dates.

Now, to get all days and their counts, I just need to group by l.x, and get count:

=$ SELECT
    l.x,
    COUNT(*)
FROM
    events e,
    lateral (
        SELECT x::DATE
        FROM generate_series(e.event_start, e.event_end, '1 day'::INTERVAL) AS x
    ) AS l
GROUP BY l.x
ORDER BY l.x
     x      | COUNT 
------------+-------
 2022-09-05 |     1
 2022-09-06 |     3
 2022-09-07 |     3
 2022-09-08 |     3
 2022-09-09 |     3
 2022-09-10 |     3
 2022-09-11 |     1
 2022-09-12 |     1
 2022-09-13 |     1
 2022-09-14 |     1
 2022-09-15 |     2
 2022-09-16 |     2
 2022-09-17 |     2
 2022-09-18 |     2
 2022-09-19 |     1
 2022-09-20 |     1
 2022-09-21 |     1
 2022-09-22 |     1
(18 ROWS)

The important part is that for every row in the source (events) the query inside lateral is called with full access to normal logic (including where clauses, functions, grouping, aggregates, ordering, limiting, anything), and resulting recordset is available to our query.

This can be used, for example, to get things like, for example, get top five employee, per department, sorted by salary:

SELECT d.*,
       le.*
FROM departments d,
    lateral (
        SELECT *
        FROM employees e
        WHERE e.dept_id = d.id
        ORDER BY e.salary DESC LIMIT 5
    ) AS le

Generally, lateral is godsend whenever you need to extract information from complex data (json?) or make non-obvious modification to joined dataset (like limiting number of rows in the example above).

There is one more thing I'd like to show. Specifically – while I don't particularly like calling lateral queries as “lateral joins", it is a fact that you can use join syntax. Which can be useful in cases where lateral query doesn't return anything.

Given the events table, let's try to get number of events for each day in September.

To get all dates in September I can simply:

=$ SELECT d::DATE AS DAY
FROM generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d;
    DAY     
------------
 2022-09-01
 2022-09-02
 2022-09-03
 2022-09-04
...
 2022-09-30
(30 ROWS)

Now, I can add lateral subquery to get events that were happening on this day:

=$ SELECT
    d::DATE AS DAY,
    l.*
FROM
    generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d,
    lateral (
        SELECT * FROM events e
        WHERE d::DATE BETWEEN e.event_start AND e.event_end
    ) AS l;
    DAY     | id | event_start | event_end  
------------+----+-------------+------------
 2022-09-05 |  5 | 2022-09-05  | 2022-09-10
 2022-09-06 |  2 | 2022-09-06  | 2022-09-11
 2022-09-06 |  3 | 2022-09-06  | 2022-09-10
 2022-09-06 |  5 | 2022-09-05  | 2022-09-10
 2022-09-07 |  2 | 2022-09-06  | 2022-09-11
 2022-09-07 |  3 | 2022-09-06  | 2022-09-10
 2022-09-07 |  5 | 2022-09-05  | 2022-09-10
 2022-09-08 |  2 | 2022-09-06  | 2022-09-11
 2022-09-08 |  3 | 2022-09-06  | 2022-09-10
 2022-09-08 |  5 | 2022-09-05  | 2022-09-10
 2022-09-09 |  2 | 2022-09-06  | 2022-09-11
 2022-09-09 |  3 | 2022-09-06  | 2022-09-10
 2022-09-09 |  5 | 2022-09-05  | 2022-09-10
 2022-09-10 |  2 | 2022-09-06  | 2022-09-11
 2022-09-10 |  3 | 2022-09-06  | 2022-09-10
 2022-09-10 |  5 | 2022-09-05  | 2022-09-10
 2022-09-11 |  2 | 2022-09-06  | 2022-09-11
 2022-09-12 |  4 | 2022-09-12  | 2022-09-18
 2022-09-13 |  4 | 2022-09-12  | 2022-09-18
 2022-09-14 |  4 | 2022-09-12  | 2022-09-18
 2022-09-15 |  1 | 2022-09-15  | 2022-09-22
 2022-09-15 |  4 | 2022-09-12  | 2022-09-18
 2022-09-16 |  1 | 2022-09-15  | 2022-09-22
 2022-09-16 |  4 | 2022-09-12  | 2022-09-18
 2022-09-17 |  1 | 2022-09-15  | 2022-09-22
 2022-09-17 |  4 | 2022-09-12  | 2022-09-18
 2022-09-18 |  1 | 2022-09-15  | 2022-09-22
 2022-09-18 |  4 | 2022-09-12  | 2022-09-18
 2022-09-19 |  1 | 2022-09-15  | 2022-09-22
 2022-09-20 |  1 | 2022-09-15  | 2022-09-22
 2022-09-21 |  1 | 2022-09-15  | 2022-09-22
 2022-09-22 |  1 | 2022-09-15  | 2022-09-22
(32 ROWS)

This worked, obviously, but if I'd get counts now, I would be missing some days:

=$ SELECT
    d::DATE AS DAY,
    COUNT(l.id) AS events
FROM
    generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d,
    lateral (
        SELECT * FROM events e
        WHERE d::DATE BETWEEN e.event_start AND e.event_end
    ) AS l
GROUP BY d.date
ORDER BY d.date
    DAY     | events 
------------+--------
 2022-09-05 |      1
 2022-09-06 |      3
 2022-09-07 |      3
 2022-09-08 |      3
 2022-09-09 |      3
 2022-09-10 |      3
 2022-09-11 |      1
 2022-09-12 |      1
 2022-09-13 |      1
 2022-09-14 |      1
 2022-09-15 |      2
 2022-09-16 |      2
 2022-09-17 |      2
 2022-09-18 |      2
 2022-09-19 |      1
 2022-09-20 |      1
 2022-09-21 |      1
 2022-09-22 |      1
(18 ROWS)

Specifically – I have no data for days where there were 0 rows. This is because lateral for these days returned 0 rows, and this “cancelled" row from generate_series.

We can work around it thanks to doing left join to lateral:

=$ SELECT
    d::DATE AS DAY,
    COUNT(l.id) AS events
FROM
    generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) d
    LEFT JOIN lateral (
        SELECT * FROM events e
        WHERE d::DATE BETWEEN e.event_start AND e.event_end
    ) AS l ON (TRUE)
GROUP BY d.date
ORDER BY d.date
    DAY     | events 
------------+--------
 2022-09-01 |      0
 2022-09-02 |      0
 2022-09-03 |      0
 2022-09-04 |      0
 2022-09-05 |      1
 2022-09-06 |      3
 2022-09-07 |      3
 2022-09-08 |      3
 2022-09-09 |      3
 2022-09-10 |      3
 2022-09-11 |      1
 2022-09-12 |      1
 2022-09-13 |      1
 2022-09-14 |      1
 2022-09-15 |      2
 2022-09-16 |      2
 2022-09-17 |      2
 2022-09-18 |      2
 2022-09-19 |      1
 2022-09-20 |      1
 2022-09-21 |      1
 2022-09-22 |      1
 2022-09-23 |      0
 2022-09-24 |      0
 2022-09-25 |      0
 2022-09-26 |      0
 2022-09-27 |      0
 2022-09-28 |      0
 2022-09-29 |      0
 2022-09-30 |      0
(30 ROWS)

This brought back 0 counts, and this is because all columns returned by lateral, for days without events, were NULL. Which means that count(l.id) was not incrementing the count (count(…) increments the count only for non-null values).

For the left join I used weird join condition: on (true). This is because JOINs need join condition. In our case the real condition is built inside the lateral query (where d::date …), but still, syntax requires something to join on. Since any row returned by lateral will be “ok to join", then we're joining on condition that is always true.

Hope it was useful to at least one person, and if something isn't clear, or you'd like to know more, please ask 🙂

7 thoughts on “What is LATERAL, what is it for, and how can one use it?”

  1. Thanks, I didn’t know about lateral.
    At least useful for 1 person!

  2. Of course in this particular example it would be easier to do the count(*) inside the lateral query and then you can use a simple cross join again.

  3. A better demonstration would be something like this.
    In this case, I choose to count events that are within one day of the date, which means that it could not be accomplished with a simple group by

    SELECT
       DAY.day,
       l.num_events_within_a_day_of
    FROM generate_series('2022-09-01', '2022-09-30', '1 day'::INTERVAL) AS d
    -- use a lateral to allow you to reference derivative columns later in the query
    CROSS JOIN LATERAL (
         SELECT d.d::DATE ) AS DAY(DAY)
    -- use a lateral again to do the aggregation while avoiding a global group-by
    CROSS JOIN LATERAL (
         SELECT COUNT(*)
         FROM events AS e
         WHERE d::DATE BETWEEN e.event_start - INTERVAL '1 day' AND e.event_end + INTERVAL '1 day'
    ) AS  l(num_events_within_a_day_of)
    ORDER BY d.date
  4. Thanks Corey that is a great use case example that I can use and will.

  5. I have a different way of saying under what circumstances you should use lateral joins. When you have a query that returns the correct rows and you wanted to add columns to your current query from other tables, under normal circumstances, if the values you are looking for comes from a single table, you would just join to that table.
    And if the column values you are looking for come from a more complex query, you could use lateral joins when the values you are looking for can be found in the other tables that can be identified by existing column values (lateral join conditions) in the current query result (row), provided that the number of returned row is 1.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.