Human/version sort in PostgreSQL

Ever been in situation where you had to sort data that is partially text, and partially numerical?

Like invoice numbers: prefix-9, prefix-10, prefix-11, other-5, other-20 ? Normally you can't do order by as you will get them in wrong order: other-20 ⇒ other-5 ⇒ prefix-10 ⇒ prefix-11 ⇒ prefix-9. Can something be done with it?

Continue reading Human/version sort in PostgreSQL

Waiting for 9.5 – Use abbreviated keys for faster sorting of text datums.

On 19th of January, Robert Haas committed patch:

Use abbreviated keys for faster sorting of text datums.
 
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob.  If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.
 
There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys.  HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.
 
Peter Geoghegan, reviewed by me.

Continue reading Waiting for 9.5 – Use abbreviated keys for faster sorting of text datums.

Explaining the unexplainable – part 3

In previous post in the series I wrote about how to interpret single line in explain analyze output, it's structure, and later on described all basic data-getting operations (nodes in explain tree).

Today, we'll move towards more complicated operations.

Continue reading Explaining the unexplainable – part 3

Why is “depesz” between “de luca” and “de vil”?

Every so often someone asks why sorting behaves irrational. Like here:

$ SELECT string FROM test ORDER BY string;
  string
----------
 dean
 deer
 de luca
 depesz
 de vil
 dyslexia
(6 ROWS)

Why aren't “de luca" and “de vil" together?

Continue reading Why is “depesz" between “de luca" and “de vil"?

Waiting for 9.1 – KNNGIST

On 4th of December, Tom Lane committed really cool patch:

KNNGIST, otherwise known as order-by-operator support for GIST.

This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.
 
There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).
 
Teodor Sigaev and Tom Lane

Continue reading Waiting for 9.1 – KNNGIST

Set operations in shell

I had this interesting case at work. We have imports of objects. Each object in import file has its “ID" (which can be any string). Same “ID" is in database.

So the idea is pretty simple – we can/should check how many of IDs from import were in database. Unfortunately – we'd rather not really do the comparison in DB, as it is pretty loaded.

Continue reading Set operations in shell