Waiting for 9.5 – Add width_bucket(anyelement, anyarray).

On 9th of September, Tom Lane committed patch:

Add width_bucket(anyelement, anyarray).
 
This provides a convenient method of classifying input values into buckets
that are not necessarily equal-width.  It works on any sortable data type.
 
The choice of function name is a bit debatable, perhaps, but showing that
there's a relationship to the SQL standard's width_bucket() function seems
more attractive than the other proposals.
 
Petr Jelinek, reviewed by Pavel Stehule

That's a nice new helper function.

It has 3 versions:

  • width_bucket(operand dp, b1 dp, b2 dp, count int)
  • width_bucket(operand numeric, b1 numeric, b2 numeric, count int)
  • width_bucket(operand anyelement, thresholds anyarray)

The first two versions are the same, just operate on different datatypes – float8/double-precision and numeric.

How does it work?

Well, it assumes that range from b1 to b2 is divided into count buckets, each with the same width. And then checks into which bucket would given operand match.

Example:

Let's assume we have simple range – 0 – 10 into 20 buckets. This means that bucket “width" is 0.5.

So, value 0.11 should go into first bucket, and 7.23 into 15th.

Let's see:

$ SELECT width_bucket( 0.11, 0, 10, 20);
 width_bucket 
--------------
            1
(1 ROW)
 
$ SELECT width_bucket( 7.23, 0, 10, 20);
 width_bucket 
--------------
           15
(1 ROW)

If the value is given outside of range:

$ SELECT width_bucket( -1, 0, 10, 20);
 width_bucket 
--------------
            0
(1 ROW)
 
$ SELECT width_bucket( 100, 0, 10, 20);
 width_bucket 
--------------
           21
(1 ROW)

Values below lower bound go to bucket 0, and values above upper bound go to bucket # (count+1).

Sane.

There is also 3rd version – one that works with any datatype. You given it value to match, and sorted array of lower-bound values, and it picks which bucket to put the value in.

Let's see on an example, based also on numbers:

$ SELECT width_bucket( 15, array[0, 2, 5, 10, 20, 40] );
 width_bucket 
--------------
            4
(1 ROW)

Simple, and to the point.

The logic that it does is not superbly complex, but it will definitely save some error-prone typing in my code. Cool stuff, thanks.