Waiting for 9.4 – Allow time delayed standbys and recovery

On 12th of December, Simon Riggs committed patch:

Allow time delayed standbys and recovery
 
Set min_recovery_apply_delay to force a delay in recovery apply for commit and
restore point WAL records. Other records are replayed immediately. Delay is
measured between WAL record time and local standby time.
 
Robert Haas, Fabrízio de Royes Mello and Simon Riggs
Detailed review by Mitsumasa Kondo

Continue reading Waiting for 9.4 – Allow time delayed standbys and recovery

OmniPITR 1.3.1

Right after releasing 1.3.0 I realized that I forgot about one thing.

If you're using ext3 (and possibly other, not sure) file system, removal of large file can cause problems due to heavy IO traffic.

We did hit this problem earlier at one of client sites, and devised a way to remove large files by truncating them, bit after bit, and getting them to small enough size to be removed in one go. I wrote about it earlier, of course.

Unfortunately – I forgot about this when releasing 1.3.0, but as soon as I tried to deploy at the client site, I noticed the missing functionality.

So, today I released 1.3.1, which adds two options to omnipitr-backup-cleanup:

  • –truncate
  • –sleep

If truncate is specified, and is more than 0, it will cause omnipitr-backup-slave to remove large files (larger than truncate value) in steps.

In pseudocode:

if param('truncate') {
  file_size = file_to_be_removed.size()
  while ( file_size > param('truncate') ) {
    file_size = file_size - param('truncate')
    file_to_be_removed.truncate_to( file_size )
    sleep( param('sleep') )
  }
}
file_to_be_removed.unlink()

So, for example, specifying –truncate=1000000, will remove the file truncating it first by 1MB blocks.

–sleep parameter is used to delay removal of next part of the file (it's used only in truncating loop, so has no meaning when truncate-loop is not used). It's value is in milliseconds, and defaults to 500 (0.5 second).

Hope you'll find it useful.

Waiting for 9.4 – Add new wal_level, logical, sufficient for logical decoding.

On 11th of December, Robert Haas committed patch:

Add new wal_level, logical, sufficient for logical decoding.
 
When wal_level=logical, we'll log columns from the old tuple as
configured by the REPLICA IDENTITY facility added in commit
<a class="text" href="/gitweb/?p=postgresql.git;a=object;h=07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65">07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65</a>.  This makes it possible
a properly-configured logical replication solution to correctly
follow table updates even if they change the chosen key columns,
or, with REPLICA IDENTITY FULL, even if the table has no key at
all.  Note that updates which do not modify the replica identity
column won't log anything extra, making the choice of a good key
(i.e. one that will rarely be changed) important to performance
when wal_level=logical is configured.
 
Each insert, update, or delete to a catalog table will also log
the CMIN and/or CMAX values of stamped by the current transaction.
This is necessary because logical decoding will require access to
historical snapshots of the catalog in order to decode some data
types, and the CMIN/CMAX values that we may need in order to judge
row visibility may have been overwritten by the time we need them.
 
Andres Freund, reviewed in various versions by myself, Heikki
Linnakangas, KONDO Mitsumasa, and many others.

Continue reading Waiting for 9.4 – Add new wal_level, logical, sufficient for logical decoding.

What does “Fix VACUUM’s tests to see whether it can update relfrozenxid” really mean?

In release notes to latest release you can find:

Fix VACUUM's tests to see whether it can update relfrozenxid (Andres Freund)
 
In some cases VACUUM (either manual or autovacuum) could incorrectly advance
a table's relfrozenxid value, allowing tuples to escape freezing, causing
those rows to become invisible once 2^31 transactions have elapsed. The
probability of data loss is fairly low since multiple incorrect advancements
would need to happen before actual loss occurs, but it's not zero. In 9.2.0
and later, the probability of loss is higher, and it's also possible to get
"could not access status of transaction" errors as a consequence of this
bug. Users upgrading from releases 9.0.4 or 8.4.8 or earlier are not
affected, but all later versions contain the bug.
 
The issue can be ameliorated by, after upgrading, vacuuming all tables in
all databases while having vacuum_freeze_table_age set to zero. This will
fix any latent corruption but will not be able to fix all pre-existing data
errors. However, an installation can be presumed safe after performing this
vacuuming if it has executed fewer than 2^31 update transactions in its
lifetime (check this with SELECT txid_current() < 2^31).

What does it really mean?

Continue reading What does “Fix VACUUM's tests to see whether it can update relfrozenxid" really mean?