October 22nd, 2007 by depesz | | 4 comments »
Did it help? If yes - maybe you can help me?

nice machine with 2 gb of ram, 800 megabytes in 2 logfiles. single word as search phrase. polish utf-8 locale (pl_PL.UTF-8), gnu grep 2.5.1. results?

=> time grep -in reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
real 1m21.212s
user 1m20.909s
sys 0m0.284s

same, check without -i:

=> time grep -n reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
real 0m1.147s
user 0m0.868s
sys 0m0.268s

after setting locale to C:

=> time grep -in reloading postgresql-2007-10-22_000000.log postgresql-2007-10-22_120909.log
postgresql-2007-10-22_000000.log:40001:2007-10-22 10:50:13.528 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
postgresql-2007-10-22_120909.log:1215696:2007-10-22 12:15:21.769 CEST @ 24681 LOG: received SIGHUP, reloading configuration files
real 0m1.209s
user 0m0.896s
sys 0m0.316s

all tests were repeated many times to get all data in memory, and check for extreme values.

does anybody need another proof that locale “thing" is broken? of course it might be that only locale handling in grep is bad, but anyway – it's still locale issue.

  1. 4 comments

  2. Jan 19, 2008

    This is a consequence of the non-trivial case folding algorithm of UTF-8 in comparison to ASCII. Not surprising.

  3. Jan 19, 2008

    @Markus Bertheau:
    i know. i just didn’t expect to have *that* big influence on timing.

  4. # Marti Raudsepp
    Mar 5, 2012

    Just a note for anyone still reading this, this bug was fixed in GNU grep 2.7

  5. Mar 5, 2012

    Marti: thanks for info, good to know.

Leave a comment