March 4th, 2013 by depesz | Tags: , , , , , | 5 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

On 4th of March, Kevin Grittner committed patch:

Add a materialized view relations.
 
A materialized view has a rule just like a view and a heap and
other physical properties like a table.  The rule is only used to
populate the table, references in queries refer to the
materialized data.
 
This is a minimal implementation, but should still be useful in
many cases.  Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed.  At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
 
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.

This is huge.

Materialized views are important, and lack of them in PostgreSQL has been quite often brought up by fans of other databases.

Now, we're getting them.

Let's see how it works.

First, let's imagine a view query that is rather slow (which is usually the point in materialized view cases). Something along the lines of:

$ create view test1 as select 1 as a where pg_sleep(5) is not null;
CREATE VIEW

Every query to it will take ~ 5 seconds:

$ \timing
Timing is on.
$ select * from test1;
 a
---
 1
(1 row)
 
Time: 5008.621 ms
$ select * from test1;
 a
---
 1
(1 row)
 
Time: 5001.657 ms
$ select * from test1;
 a
---
 1
(1 row)
 
Time: 5001.519 ms

Nice. Now, I can create materialized view, that will be clearly faster:

create materialized view test2 as
    select 1 as a where pg_sleep(5) is not null;
SELECT 1

Interestingly, it's return text is “SELECT 1″ and not “CREATE VIEW". Plus – it took 5 seconds to run.

But now:

$ \timing
Timing is on.
$ select * from test2;
 a
---
 1
(1 row)
 
Time: 8.060 ms
$ select * from test2;
 a
---
 1
(1 row)
 
Time: 0.311 ms
$ select * from test2;
 a
---
 1
(1 row)
 
Time: 0.263 ms

Nice.

of course – materialized views need to be updated every now and then. This is done using “REFRESH MATERIALIZED VIEW" command, like this:

$ \timing
Timing is on.
$ REFRESH MATERIALIZED VIEW test2;
REFRESH MATERIALIZED VIEW
Time: 5037.111 ms

Unfortunately, during view recreation, you can't access old view data, as it's fully locked:

$ begin;
BEGIN
 
$ select oid, relfilenode from pg_class where relname = 'test2';
  oid  | relfilenode
-------+-------------
 28686 |       28693
(1 row)
 
$ refresh materialized view test2;
REFRESH MATERIALIZED VIEW
 
$ select * from pg_locks where pid = pg_backend_pid();
   locktype    | database | relation |  page  | tuple  | virtualxid | transactionid | classid | objid  | objsubid | virtualtransaction |  pid  |        mode         | granted | fastpath
---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+-------+---------------------+---------+----------
 relation      |    16393 |    11090 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessShareLock     | t       | t
 relation      |    16393 |     2663 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessShareLock     | t       | t
 relation      |    16393 |     2662 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessShareLock     | t       | t
 relation      |    16393 |     1259 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessShareLock     | t       | t
 virtualxid    |   [null] |   [null] | [null] | [null] | 2/2219     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | ExclusiveLock       | t       | t
 transactionid |   [null] |   [null] | [null] | [null] | [null]     |          2716 |  [null] | [null] |   [null] | 2/2219             | 26886 | ExclusiveLock       | t       | f
 object        |    16393 |   [null] | [null] | [null] | [null]     |        [null] |    1247 |  28700 |        0 | 2/2219             | 26886 | AccessExclusiveLock | t       | f
 relation      |    16393 |    28699 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessExclusiveLock | t       | f
 relation      |    16393 |    28686 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | ShareLock           | t       | f
 relation      |    16393 |    28686 | [null] | [null] | [null]     |        [null] |  [null] | [null] |   [null] | 2/2219             | 26886 | AccessExclusiveLock | t       | f
 object        |    16393 |   [null] | [null] | [null] | [null]     |        [null] |    1247 |  28701 |        0 | 2/2219             | 26886 | AccessExclusiveLock | t       | f
 object        |        0 |   [null] | [null] | [null] | [null]     |        [null] |    1260 |  16385 |        0 | 2/2219             | 26886 | AccessShareLock     | t       | f
(12 rows)
 
$ select locktype, mode from pg_locks where pid = pg_backend_pid() and relation = 'test2'::regclass;
 locktype |        mode
----------+---------------------
 relation | ShareLock
 relation | AccessExclusiveLock
(2 rows)
 
$ commit;
COMMIT

This means that refreshing will block all concurrent access to the view. Whether this is acceptable for your situation – it's up to you. If it's not, you can do something along the lines of:

$ do $$
begin
execute 'create materialized view test2_new as ' || pg_get_viewdef('test2'::regclass);
end;
$$;
DO
 
$ begin;
BEGIN
 
$ drop materialized view test2;
DROP MATERIALIZED VIEW
 
$ alter materialized view test2_new rename to test2;
ALTER MATERIALIZED VIEW
 
$ commit;
COMMIT

Maybe with some statement_timeout/lock/statement_timeout loop, to make it less lock prone.

Since materialized view is internally a table, you can have indexes on it, to further speed up queries on it.

All in all – it's great first step, thanks guys.

  1. 5 comments

  2. # Andreas
    Mar 4, 2013

    I hope the locking during REFRESH will be solved since where I have hacked my own materialized views one of the main concerns has been to avoid locking out readers during the refresh.

  3. # Matt
    Mar 5, 2013

    Maybe “REFRESH MATERIALIZED VIEW test2 CONCURRENTLY;”, similar to how index rebuilds work.

    I am also hoping for incremental soon. Regarding determining what is fresh, maybe something similar to how covering indexes work.

  4. Mar 5, 2013

    “REFRESH MATERIALIZED VIEW ” do full or incremental refresh? IF do full refresh every time , it will cause a lot of I/O.

  5. # Andreas
    Mar 5, 2013

    @Francs: It does a full refresh while keeping the view locked.

    I love seeing huge features released in small useful steps, but personally I will have to wait for locking to be solved for this feature to be useful to me. The full refresh on the other hand does not bother me. It all depends on what usecases one has.

  6. # Anonymous
    Jun 28, 2013

    “Materialized views are important, and lack of them in PostgreSQL has been quite often brought up by fans of other databases.
    Now, we’re getting them.”

    Not even close.

    The idea of materialized views is that you can replace “CREATE VIEW foo AS …” with “CREATE MATERIALIZED VIEW foo AS …” without changing the results of a “SELECT * FROM foo”.

    PostgreSQL 9.3 manages to completely ignore this fundamental correctness requirement, the whole thing is a joke. Even a ten line patch that just ignores the MATERIALIZED keyword would have done better.

Leave a comment