Waiting for PostgreSQL 10 – Implement table partitioning.

I had two month delay related to some work, but now I can finally write about:

On 7th of December, Robert Haas committed patch:

Implement table partitioning.
 
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own.  The children are called
partitions and contain all of the actual data.  Each partition has an
implicit partitioning constraint.  Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed.  Partitions
can't have extra columns and may not allow nulls unless the parent
does.  Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
 
Currently, tables can be range-partitioned or list-partitioned.  List
partitioning is limited to a single column, but range partitioning can
involve multiple columns.  A partitioning "column" can be an
expression.
 
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations.  The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
 
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others.  Minor revisions by me.

So, you might have read my previous blogpost about partitioning. If you haven't – no big deal, but knowing how to do partitioning on versions below 10 will help you understand what really has changed.

The thing that has changed, is that we no longer need partitioning triggers or rules. Maintenance (creation/dropping of partitions) still have to be handled manually, but it's a step in right direction.

Based on description above, it looks like there are also some limitations, but let's see it step by step.

First thing, it doesn't look like you can set a table, insert some data, and then turn it into partitioned. You have to create master table, mark it as partition master, and then you can add partitions.

How does that looks?

Let's do two simple cases:

  • table users, partitioned by range on id (serial)
  • table stats, partitioned by list of values

Specifically, the tables will have following schema:

CREATE TABLE users (
    id             serial PRIMARY KEY,
    username       text NOT NULL UNIQUE,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL,
    UNIQUE (username)
);

and

CREATE TABLE stats (
    stat_type text,
    stat_year int4,
    stat_name text,
    stat_value float8,
    PRIMARY KEY (stat_type, stat_year, stat_name)
);

Starting with users, and partitioning by range on id, I encountered a problem immediately:

CREATE TABLE users (
    id             serial PRIMARY KEY,
    username       text NOT NULL UNIQUE,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL,
    UNIQUE (username)
)
PARTITION BY RANGE ( id )
;
ERROR:  PRIMARY KEY constraints are NOT supported ON partitioned TABLES
LINE 2:     id             serial PRIMARY KEY,
                                  ^

OK, So I can't have primary key. That's not really, cool, but let's roll with it:

$ CREATE SEQUENCE users_id_seq;
CREATE SEQUENCE
 
$ CREATE TABLE users (
    id             int8 NOT NULL DEFAULT NEXTVAL('users_id_seq'),
    username       text NOT NULL UNIQUE,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL,
    UNIQUE (id)
)
PARTITION BY RANGE ( id );
ERROR:  UNIQUE constraints are NOT supported ON partitioned TABLES
LINE 3:     username       text NOT NULL UNIQUE,
                                         ^

At this moment, I'm starting to doubt my sanity. What is the purpose of partitioning when I can't have primary key or even unique index? But let's say, it's just for master table, so let's retry, again, with just bare table definition:

$ CREATE TABLE users (
    id             serial NOT NULL,
    username       text NOT NULL,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL
)
PARTITION BY RANGE ( id );
CREATE TABLE

OK. It worked. \d looks like this:

$ \d users
                                         TABLE "public.users"
     COLUMN     |           TYPE           | Collation | NULLABLE |              DEFAULT              
----------------+--------------------------+-----------+----------+-----------------------------------
 id             | INTEGER                  |           | NOT NULL | NEXTVAL('users_id_seq'::regclass)
 username       | text                     |           | NOT NULL | 
 password       | text                     |           |          | 
 created_on     | TIMESTAMP WITH TIME zone |           | NOT NULL | 
 last_logged_on | TIMESTAMP WITH TIME zone |           | NOT NULL | 
Partition KEY: RANGE (id)

Please note lack of triggers, indexes and so on.

Let's try to see what will happen if I insert row into such table:

INSERT INTO users (username, created_on, last_logged_on)
    VALUES ('depesz', now(), now());
ERROR:  no partition OF relation "users" found FOR ROW
DETAIL:  Failing ROW contains (1, depesz, NULL, 2017-02-05 13:45:28.99477+01, 2017-02-05 13:45:28.99477+01).

This is to be expected – after all – there shouldn't be any rows in in master table. So I need to add some partitions.

Let's assume, for now, that I will create two partitions – one for users with id < 10, and one for id > 10. Just for simplity sake:

$ CREATE TABLE users_0
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM (unbounded) TO (9);
CREATE TABLE
 
$ CREATE TABLE users_1
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM (10) TO (unbounded);
CREATE TABLE

Let's see how it changes our users definition, and how the partition definition looks like:

$ \d users
                                         TABLE "public.users"
     COLUMN     |           TYPE           | Collation | NULLABLE |              DEFAULT              
----------------+--------------------------+-----------+----------+-----------------------------------
 id             | INTEGER                  |           | NOT NULL | NEXTVAL('users_id_seq'::regclass)
 username       | text                     |           | NOT NULL | 
 password       | text                     |           |          | 
 created_on     | TIMESTAMP WITH TIME zone |           | NOT NULL | 
 last_logged_on | TIMESTAMP WITH TIME zone |           | NOT NULL | 
Partition KEY: RANGE (id)
NUMBER OF partitions: 2 (USE \d+ TO list them.)
 
$ \d users_0
                           TABLE "public.users_0"
     COLUMN     |           TYPE           | Collation | NULLABLE | DEFAULT 
----------------+--------------------------+-----------+----------+---------
 id             | INTEGER                  |           | NOT NULL | 
 username       | text                     |           | NOT NULL | 
 password       | text                     |           |          | 
 created_on     | TIMESTAMP WITH TIME zone |           | NOT NULL | 
 last_logged_on | TIMESTAMP WITH TIME zone |           | NOT NULL | 
Partition OF: users FOR VALUES FROM (UNBOUNDED) TO (9)
Indexes:
    "users_0_pkey" PRIMARY KEY, btree (id)
    "users_0_username_key" UNIQUE CONSTRAINT, btree (username)
 
$ \d users_1
                           TABLE "public.users_1"
     COLUMN     |           TYPE           | Collation | NULLABLE | DEFAULT 
----------------+--------------------------+-----------+----------+---------
 id             | INTEGER                  |           | NOT NULL | 
 username       | text                     |           | NOT NULL | 
 password       | text                     |           |          | 
 created_on     | TIMESTAMP WITH TIME zone |           | NOT NULL | 
 last_logged_on | TIMESTAMP WITH TIME zone |           | NOT NULL | 
Partition OF: users FOR VALUES FROM (10) TO (UNBOUNDED)
Indexes:
    "users_1_pkey" PRIMARY KEY, btree (id)
    "users_1_username_key" UNIQUE CONSTRAINT, btree (username)

users information shows: Number of partitions: 2 (Use \d+ to list them.), so let's list them:

$ \d+ users
                                                             TABLE "public.users"
     COLUMN     |           TYPE           | Collation | NULLABLE |              DEFAULT              | Storage  | Stats target | Description 
----------------+--------------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id             | INTEGER                  |           | NOT NULL | NEXTVAL('users_id_seq'::regclass) | plain    |              | 
 username       | text                     |           | NOT NULL |                                   | extended |              | 
 password       | text                     |           |          |                                   | extended |              | 
 created_on     | TIMESTAMP WITH TIME zone |           | NOT NULL |                                   | plain    |              | 
 last_logged_on | TIMESTAMP WITH TIME zone |           | NOT NULL |                                   | plain    |              | 
Partition KEY: RANGE (id)
Partitions: users_0 FOR VALUES FROM (UNBOUNDED) TO (9),
            users_1 FOR VALUES FROM (10) TO (UNBOUNDED)

OK. Looks like we're getting somewhere. Obviously (well, for me) this schema is plainly wrong, because it will allow the same user to be created in multiple partitions (unique on username is only per partition), but it's just simplistic example, so let's see how that would work out.

Now our previous insert should work:

$ INSERT INTO users (username, created_on, last_logged_on)
    VALUES ('depesz', now(), now());
INSERT 0 1

Yeah. And does it exist?

$ SELECT * FROM users;
 id | username | password |          created_on           |        last_logged_on         
----+----------+----------+-------------------------------+-------------------------------
  2 | depesz   | [NULL]   | 2017-02-05 13:53:23.244634+01 | 2017-02-05 13:53:23.244634+01
(1 ROW)
 
$ SELECT * FROM ONLY users;
 id | username | password | created_on | last_logged_on 
----+----------+----------+------------+----------------
(0 ROWS)
 
$ SELECT * FROM users_0;
 id | username | password |          created_on           |        last_logged_on         
----+----------+----------+-------------------------------+-------------------------------
  2 | depesz   | [NULL]   | 2017-02-05 13:53:23.244634+01 | 2017-02-05 13:53:23.244634+01
(1 ROW)
 
$ SELECT * FROM users_1;
 id | username | password | created_on | last_logged_on 
----+----------+----------+------------+----------------
(0 ROWS)

Sweet. We can see the row in users table, but not when using “only users" (skips rows from partitions), and it was properly routed to users_0 partition. So let's add some more rows:

$ INSERT INTO users (username, created_on, last_logged_on)
    VALUES
        ( 'transfiguring', now(), now() ),
        ( 'vamooses', now(), now() ),
        ( 'trimarans', now(), now() ),
        ( 'voltmeters', now(), now() ),
        ( 'noticeable', now(), now() ),
        ( 'caesareans', now(), now() ),
        ( 'unjustly', now(), now() ),
        ( 'musket', now(), now() ),
        ( 'sweepstake', now(), now() ),
        ( 'cosmetologist', now(), now() ),
        ( 'tonsorial', now(), now() ),
        ( 'yoked', now(), now() ),
        ( 'gritted', now(), now() ),
        ( 'dhotis', now(), now() ),
        ( 'prejudice', now(), now() );
ERROR:  no partition OF relation "users" found FOR ROW
DETAIL:  Failing ROW contains (9, unjustly, NULL, 2017-02-05 13:56:13.772016+01, 2017-02-05 13:56:13.772016+01).

The error is interesting. Why it can't find partition for row with id 9?

I'm not seeing any mention of it in docs (maybe I'm missing something), but I assume that FROM x TO y, means: greater than, or equal to, x and less than y. Let's test that idea:

$ DROP TABLE users_1;
DROP TABLE
 
$ CREATE TABLE users_1
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM (9) TO (unbounded);
CREATE TABLE

Now, quick reset of sequence, and retry the insert:

$ SELECT SETVAL('users_id_seq', 2);
 SETVAL 
--------
      2
(1 ROW)
 
$ INSERT INTO users (username, created_on, last_logged_on)
    VALUES
        ( 'transfiguring', now(), now() ),
        ( 'vamooses', now(), now() ),
        ( 'trimarans', now(), now() ),
        ( 'voltmeters', now(), now() ),
        ( 'noticeable', now(), now() ),
        ( 'caesareans', now(), now() ),
        ( 'unjustly', now(), now() ),
        ( 'musket', now(), now() ),
        ( 'sweepstake', now(), now() ),
        ( 'cosmetologist', now(), now() ),
        ( 'tonsorial', now(), now() ),
        ( 'yoked', now(), now() ),
        ( 'gritted', now(), now() ),
        ( 'dhotis', now(), now() ),
        ( 'prejudice', now(), now() );
INSERT 0 15

Looks better, and how were rows distributed?

$ SELECT * FROM users_0;
 id |   username    | password |          created_on           |        last_logged_on         
----+---------------+----------+-------------------------------+-------------------------------
  2 | depesz        |          | 2017-02-05 13:53:23.244634+01 | 2017-02-05 13:53:23.244634+01
  3 | transfiguring |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
  4 | vamooses      |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
  5 | trimarans     |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
  6 | voltmeters    |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
  7 | noticeable    |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
  8 | caesareans    |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
(7 ROWS)
 
$ SELECT * FROM users_1;
 id |   username    | password |          created_on           |        last_logged_on         
----+---------------+----------+-------------------------------+-------------------------------
  9 | unjustly      |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 10 | musket        |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 11 | sweepstake    |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 12 | cosmetologist |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 13 | tonsorial     |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 14 | yoked         |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 15 | gritted       |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 16 | dhotis        |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
 17 | prejudice     |          | 2017-02-05 14:01:33.033116+01 | 2017-02-05 14:01:33.033116+01
(9 ROWS)

This is great, so far, but how I can add new partition?

Let's say that I want to move rows with id over 20 to another partition:

$ CREATE TABLE users_2
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM (20) TO (unbounded);
ERROR:  partition "users_2" would overlap partition "users_1"

This makes sense, but also causes some pain if you want to have partitions that are unbounded.

Luckily we can work with it:

$ ALTER TABLE users detach partition users_1;
ALTER TABLE
 
$ ALTER TABLE users attach partition users_1 FOR VALUES FROM (9) TO (20);
ALTER TABLE
 
$ CREATE TABLE users_2
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM (20) TO (unbounded);
CREATE TABLE

So, it works, but for the duration of the process, partition users_1 will not be used, when querying users.

The proper solution, in this case, seems to be to avoid unbounded ranges, and simply make sure you always keep enough partitions ready.

There is also one more thing. As docs say, when you attach partition, it will have to be fully scanned, to make sure that all rows in it match partition definition.

So, if you'd doing detach, attach – it will render the partition inaccessible for duration of while table scan.

Basically – given range partitioning, simply keep a number of partitions ready for future use, and create new ones when you're close to filling already existing ones. For this you will need some way to schedule jobs – using cron, pgagent or (possibly) pg_partman (though I'm not sure if pg_partman already supports built-in partitioning semantics.

Anyway – the whole process looks simpler, rules for partitioning is displayed in much nicer way, and we're not using triggers, so it should be faster. Let's see about it.

I'll create 3 tables: users_normal (simple table with schema as above), users_triggered (users table, with trigger on insert that puts data into partitions created for 10000 rows each), and users_partitioned (same thing, but using PG10 partitioning).

$ CREATE TABLE users_normal (
    id             serial PRIMARY KEY,
    username       text NOT NULL UNIQUE,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL,
    UNIQUE (username)
);
CREATE TABLE
 
$ CREATE TABLE users_triggered (
    id             serial PRIMARY KEY,
    username       text NOT NULL UNIQUE,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL,
    UNIQUE (username)
);
CREATE TABLE
 
$ CREATE TABLE users_triggered_1 ( LIKE users_triggered including ALL ) inherits (users_triggered);
CREATE TABLE
 
$ CREATE TABLE users_triggered_2 ( LIKE users_triggered including ALL ) inherits (users_triggered);
CREATE TABLE
 
...
 
$ CREATE TABLE users_triggered_10 ( LIKE users_triggered including ALL ) inherits (users_triggered);
CREATE TABLE
 
$ ALTER TABLE users_triggered_1 ADD CONSTRAINT partition_check CHECK (id > 0 AND id <= 10000 );
ALTER TABLE
 
$ ALTER TABLE users_triggered_2 ADD CONSTRAINT partition_check CHECK (id > 10000 AND id <= 20000 );
ALTER TABLE
 
...
 
$ ALTER TABLE users_triggered_10 ADD CONSTRAINT partition_check CHECK (id > 90000 AND id <= 100000 );
ALTER TABLE
 
$ CREATE FUNCTION partition_for_users_triggered() RETURNS TRIGGER AS $$
DECLARE
    v_parition_name text;
BEGIN
    v_parition_name := format( 'users_triggered_%s', CEIL(NEW.id::NUMERIC/ 10000) );
    EXECUTE 'INSERT INTO ' || v_parition_name || ' VALUES ( ($1).* )' USING NEW;
    RETURN NULL;
END;
 
$$ LANGUAGE plpgsql;
CREATE FUNCTION
 
$ CREATE TRIGGER partition_for_users_triggered BEFORE INSERT ON users_triggered FOR each ROW EXECUTE PROCEDURE partition_for_users_triggered();
CREATE TRIGGER
 
$ CREATE TABLE users_partitioned (
    id             serial NOT NULL,
    username       text NOT NULL,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL
) partition BY range (id);
CREATE TABLE
 
$ CREATE TABLE users_partitioned_1 partition OF users_partitioned (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (1) TO (10001);
CREATE TABLE
 
$ CREATE TABLE users_partitioned_2 partition OF users_partitioned (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (10001) TO (20001);
CREATE TABLE
 
...
 
$ CREATE TABLE users_partitioned_10 partition OF users_partitioned (id, PRIMARY KEY (id), UNIQUE (username)) FOR VALUES FROM (90001) TO (100001);
CREATE TABLE

With this in place, I can insert 100000 rows to each of these tables.

This I did with the fastest way to insert data: COPY:

\copy users_normal (username, created_on, last_logged_on) FROM '100k.users.tsv'
\copy users_triggered (username, created_on, last_logged_on) FROM '100k.users.tsv'
\copy users_partitioned (username, created_on, last_logged_on) FROM '100k.users.tsv'

Ran each test ten times, and got following results (time shown is best out of ten):

  • unpartitioned table: 763ms
  • table with trigger: 3803ms
  • table partitioned with PG10 partitions: 764ms

This is incredible. It's on par with simple table, and we get data spread. Wow.

Now for the list-of-values partitioning. Knowing the pitfalls:

$ CREATE TABLE stats (
    stat_type text NOT NULL,
    stat_year int4 NOT NULL,
    stat_name text NOT NULL,
    stat_value float8
)
partition BY list (stat_type);
CREATE TABLE

Adding some partitions is, obviously, simple:

$ CREATE TABLE stats_1
    partition OF stats (PRIMARY KEY (stat_type, stat_year, stat_name))
    FOR VALUES IN ('t1', 't2');
CREATE TABLE
 
$ CREATE TABLE stats_2
    partition OF stats (PRIMARY KEY (stat_type, stat_year, stat_name))
    FOR VALUES IN ('t3', 't4');
CREATE TABLE

\d shows:

\d+ stats
                                          TABLE "public.stats"
   COLUMN   |       TYPE       | Collation | NULLABLE | DEFAULT | Storage  | Stats target | Description 
------------+------------------+-----------+----------+---------+----------+--------------+-------------
 stat_type  | text             |           | NOT NULL |         | extended |              | 
 stat_year  | INTEGER          |           | NOT NULL |         | plain    |              | 
 stat_name  | text             |           | NOT NULL |         | extended |              | 
 stat_value | DOUBLE PRECISION |           |          |         | plain    |              | 
Partition KEY: LIST (stat_type)
Partitions: stats_1 FOR VALUES IN ('t1', 't2'),
            stats_2 FOR VALUES IN ('t3', 't4')
 
\d stats_1
                     TABLE "public.stats_1"
   COLUMN   |       TYPE       | Collation | NULLABLE | DEFAULT 
------------+------------------+-----------+----------+---------
 stat_type  | text             |           | NOT NULL | 
 stat_year  | INTEGER          |           | NOT NULL | 
 stat_name  | text             |           | NOT NULL | 
 stat_value | DOUBLE PRECISION |           |          | 
Partition OF: stats FOR VALUES IN ('t1', 't2')
Indexes:
    "stats_1_pkey" PRIMARY KEY, btree (stat_type, stat_year, stat_name)
 
\d stats_2
                     TABLE "public.stats_2"
   COLUMN   |       TYPE       | Collation | NULLABLE | DEFAULT 
------------+------------------+-----------+----------+---------
 stat_type  | text             |           | NOT NULL | 
 stat_year  | INTEGER          |           | NOT NULL | 
 stat_name  | text             |           | NOT NULL | 
 stat_value | DOUBLE PRECISION |           |          | 
Partition OF: stats FOR VALUES IN ('t3', 't4')
Indexes:
    "stats_2_pkey" PRIMARY KEY, btree (stat_type, stat_year, stat_name)

All looks sane. Inserting data, obviously, works:

$ INSERT INTO stats (stat_type, stat_year, stat_name, stat_value)
    VALUES
        ('t1', '2017', 'a', 1),
        ('t2', '2017', 'a', 2),
        ('t3', '2017', 'a', 3),
        ('t4', '2017', 'a', 4);
INSERT 0 4
 
$ SELECT * FROM stats;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t1        |      2017 | a         |          1
 t2        |      2017 | a         |          2
 t3        |      2017 | a         |          3
 t4        |      2017 | a         |          4
(4 ROWS)
 
$ SELECT * FROM stats_1;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t1        |      2017 | a         |          1
 t2        |      2017 | a         |          2
(2 ROWS)
 
$ SELECT * FROM stats_2;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t3        |      2017 | a         |          3
 t4        |      2017 | a         |          4
(2 ROWS)

Now, let's try to make something cooler. Let's assume that stat_type ‘t5' will contain lots of rows, and we'd like to subdivide it also by year. So we can:

$ CREATE TABLE stats_3
    partition OF stats
    FOR VALUES IN ('t5')
    partition BY range (stat_type, stat_year);
CREATE TABLE

This created partition, that will be further partitioned, and the sub-partitions will be done by range.

I could have used by range (stat_year), since stat_type will be always ‘t5', but thanks to multicolumn range, I will be able to use primary key index to find rows.

Please note that multicolumn conditions are supported only in range partitioning.

So, now, the subpartitions:

$ CREATE TABLE stats_3_1
    partition OF stats_3 ( PRIMARY KEY (stat_type, stat_year, stat_name) )
    FOR VALUES FROM ('t5', 2000) TO ('t5', 2010);
CREATE TABLE
 
$ CREATE TABLE stats_3_2
    partition OF stats_3 ( PRIMARY KEY (stat_type, stat_year, stat_name) )
    FOR VALUES FROM ('t5', 2010) TO ('t5', 2020);
CREATE TABLE

Let's test it:

$ INSERT INTO stats (stat_type, stat_year, stat_name, stat_value)
    VALUES
        ( 't5', 2005, 'x', 1 ),
        ( 't5', 2015, 'x', 2 );
INSERT 0 2
 
$ SELECT * FROM stats;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t1        |      2017 | a         |          1
 t2        |      2017 | a         |          2
 t3        |      2017 | a         |          3
 t4        |      2017 | a         |          4
 t5        |      2005 | x         |          1
 t5        |      2015 | x         |          2
(6 ROWS)
 
$ SELECT * FROM stats_3;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t5        |      2005 | x         |          1
 t5        |      2015 | x         |          2
(2 ROWS)
 
$ SELECT * FROM stats_3_1;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t5        |      2005 | x         |          1
(1 ROW)
 
$ SELECT * FROM stats_3_2;
 stat_type | stat_year | stat_name | stat_value 
-----------+-----------+-----------+------------
 t5        |      2015 | x         |          2
(1 ROW)

Finally, one more thing. In all of the cases I was using partitioning by columns. But you can use any arbitrary expression. Like:

$ CREATE TABLE users (
    id             serial NOT NULL,
    username       text NOT NULL,
    password       text,
    created_on     timestamptz NOT NULL,
    last_logged_on timestamptz NOT NULL
)
PARTITION BY RANGE ( LOWER( LEFT( username, 1 ) ) );
CREATE TABLE
 
$ CREATE TABLE users_0
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM ('a') TO ('g');
CREATE TABLE
 
$ CREATE TABLE users_1
    partition OF users (id, PRIMARY KEY (id), UNIQUE (username))
    FOR VALUES FROM ('g') TO (unbounded);
CREATE TABLE

And now we have users split into partitions based on first letter of their username:

$ INSERT INTO users (username, created_on, last_logged_on)
    VALUES
        ('depesz', now(), now()),
        ('xxx', now(), now());
INSERT 0 2
 
$ SELECT * FROM users_0;
 id | username | password |          created_on           |        last_logged_on         
----+----------+----------+-------------------------------+-------------------------------
  1 | depesz   |          | 2017-02-06 10:16:14.816691+01 | 2017-02-06 10:16:14.816691+01
(1 ROW)
 
$ SELECT * FROM users_1;
 id | username | password |          created_on           |        last_logged_on         
----+----------+----------+-------------------------------+-------------------------------
  2 | xxx      |          | 2017-02-06 10:16:14.816691+01 | 2017-02-06 10:16:14.816691+01
(1 ROW)

In this case, using of unbounded might make sense given that usernames can start on something else than an a-z letter.

All in all it's a great addition to PostgreSQL, and with parallel queries, and remote tables it will gives us proper sharding really soon. Thanks to all involved, you did a great job.

6 thoughts on “Waiting for PostgreSQL 10 – Implement table partitioning.”

  1. I do appreciate the work that has been done here, but without the support for foreign keys this still doesn’t seem that much of an improvement.

  2. Wielka szkoda, że obecnie nie ma możliwości stworzenia indeksu albo referencji z/do partycjonowanej tabeli. Czy jest szansa, że te niedogodności zostaną usunięte w kolejnych wersjach postgresa (czy są jakieś fundamentalne przyczyny nie pozwalające na to)?

  3. @Yogi:

    Indeks można zrobić. Nie wiem skąd pomysł, że nie można.

    Co do referencji – nie mam pojęcia czy są jakieś prace nad tym.

  4. @Yogi:
    nadal możesz – tyle, że na każdej partycji oddzielnie. Działa.

    Jeśli partycjonujesz po czymś co ma być unikatowe, to wystarczy. Jak chcesz mieć unikatowość między partycjami, to niestety na razie nie ma.

Comments are closed.