I had two month delay related to some work, but now I can finally write about:
On 7th of December, Robert Haas committed patch:
Implement table partitioning.
Table partitioning is like table inheritance and reuses much of the
existing infrastructure, but there are some important differences.
The parent is called a partitioned table and is always empty; it may
not have indexes or non-inherited constraints, since those make no
sense for a relation with no data of its own. The children are called
partitions and contain all of the actual data. Each partition has an
implicit partitioning constraint. Multiple inheritance is not
allowed, and partitioning and inheritance can't be mixed. Partitions
can't have extra columns and may not allow nulls unless the parent
does. Tuples inserted into the parent are automatically routed to the
correct partition, so tuple-routing ON INSERT triggers are not needed.
Tuple routing isn't yet supported for partitions which are foreign
tables, and it doesn't handle updates that cross partition boundaries.
Currently, tables can be range-partitioned or list-partitioned. List
partitioning is limited to a single column, but range partitioning can
involve multiple columns. A partitioning "column" can be an
Because table partitioning is less general than table inheritance, it
is hoped that it will be easier to reason about properties of
partitions, and therefore that this will serve as a better foundation
for a variety of possible optimizations, including query planner
optimizations. The tuple routing based which this patch does based on
the implicit partitioning constraints is an example of this, but it
seems likely that many other useful optimizations are also possible.
Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat,
Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova,
Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
Continue reading Waiting for PostgreSQL 10 – Implement table partitioning.
Some time ago, guys from Instagram shared their approach to generating unique ids on multiple hosts in a way that guarantees (to reasonable extend) uniqueness, and doesn't require any centralized service.
Earlier this month, the build benchmarked their solution vs. UUIDs, and vs. bigserial.
I thought – whether C based code for nextid would be faster.
Continue reading Is C faster for (instagram-style) ID generation?
The general knowledge is that numerics are slower than integers/float, but offer precision and ranges that are better.
While I understand what is slow, I don't really know how much slower numerics are. So let's test it.
Continue reading How much slower are numerics?
When working with PostgreSQL you generally want to get information about slow queries. The usual approach is to set log_min_duration_statement to some low(ish) value, run your app, and then analyze logs.
But you can log to many places – flat file, flat file on another disk, local syslog, remote syslog. And – perhaps, instead of log_min_duration_statement – just use pg_stat_statements?
Well, I wondered about it, and decided to test.
Continue reading What logging has least overhead?
Today, Mattias|farm on IRC asked how to create primary key using HASH index. After some talk, he said that in some books it said that for “=" (equality) hash indexes are better.
So, I digged a bit deeper.
Continue reading Should you use HASH index?
Every now and then there is someone on IRC, mailing lists, or private contact which asks about rules.
My answer virtually always is: don't use rules. If you think that they solve your problem, think again. Why?
Continue reading To rule or not to rule – that is the question
On 3rd of August, Tatsuo Ishii committed patch by ITAGAKI Takahiro:
Multi-threaded version of pgbench contributed by ITAGAKI Takahiro,
reviewed by Greg Smith and Josh Williams.
Following is the proposal from ITAGAKI Takahiro:
Pgbench is a famous tool to measure postgres performance, but nowadays
it does not work well because it cannot use multiple CPUs. On the other
hand, postgres server can use CPUs very well, so the bottle-neck of
workload is *in pgbench*.
Multi-threading would be a solution. The attached patch adds -j
(number of jobs) option to pgbench. If the value N is greater than 1,
pgbench runs with N threads. Connections are equally-divided into
them (ex. -c64 -j4 => 4 threads with 16 connections each). It can
run on POSIX platforms with pthread and on Windows with win32 threads.
Here are results of multi-threaded pgbench runs on Fedora 11 with intel
core i7 (8 logical cores = 4 physical cores * HT). -j8 (8 threads) was
the best and the tps is 4.5 times of -j1, that is a traditional result.
$ pgbench -i -s10
$ pgbench -n -S -c64 -j1 => tps = 11600.158593
$ pgbench -n -S -c64 -j2 => tps = 17947.100954
$ pgbench -n -S -c64 -j4 => tps = 26571.124001
$ pgbench -n -S -c64 -j8 => tps = 52725.470403
$ pgbench -n -S -c64 -j16 => tps = 38976.675319
$ pgbench -n -S -c64 -j32 => tps = 28998.499601
$ pgbench -n -S -c64 -j64 => tps = 26701.877815
Is it acceptable to use pthread in contrib module?
If ok, I will add the patch to the next commitfest.
Continue reading Waiting for 8.5 – Multi-threaded pgbench
couple of times is was brought to attention of #postgresql channel – how to find longest prefix.
case: you have phone number, and you want to find which carrier it is bound to.
there are at least 3 ways of finding it, and i decided to take a look at which is fastest.
Continue reading searching for longest prefix