Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

On 2nd of April 2019, Alvaro Herrera committed patch:

Report progress of CREATE INDEX operations
 
 
This uses the progress reporting infrastructure added by ,
adding support for CREATE INDEX and CREATE INDEX CONCURRENTLY.
 
There are two pieces to this: one is index-AM-agnostic, and the other is
AM-specific.  The latter is fairly elaborate for btrees, including
reportage for parallel index builds and the separate phases that btree
index creation uses; other index AMs, which are much simpler in their
building procedures, have simplistic reporting only, but that seems
sufficient, at least for non-concurrent builds.
 
The index-AM-agnostic part is fairly complete, providing insight into
the CONCURRENTLY wait phases as well as block-based progress during the
index validation table scan.  (The index validation index scan requires
patching each AM, which has not been included here.)
 
Reviewers: Rahila Syed, Pavan Deolasee, Tatsuro Yamada
Discussion: https://postgr.es/m/20181220220022.mg63bhk26zdpvmcj@alvherre.pgsql

Continue reading Waiting for PostgreSQL 12 – Report progress of CREATE INDEX operations

Tips N’ Tricks – Generating readable reports with plain SQL

Let's say you imported some data, but it contains duplicates. You will have to handle them in some way, but to make sensible choice on how to handle it, you need more information.

So, let's start. We have table:

# \d users
                                    Table "public.users"
   Column   |           Type           |                     Modifiers
------------+--------------------------+----------------------------------------------------
 id         | integer                  | not null default nextval('users_id_seq'::regclass)
 username   | text                     |
 registered | timestamp with time zone |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

Continue reading Tips N’ Tricks – Generating readable reports with plain SQL

Getting list of most common domains

Today, on #postgresql on IRC, guy (can't contact him now to get his permission to name him), said:

I have a table called problematic_hostnames. It contains a list of banned hostnames in column “hostname" (varchar). I would like to display the top 10 troll ISPs based on this. Does PG have a way of spotting a “pattern"? Some ISPs are example.net while others are foo.bar.example.net, so you can't just regexp the last X.Y (since that would cause “.co.uk" to be one of the top troll ISPs).

Continue reading Getting list of most common domains