Why don't you just get CentOS SRPM for the grep binary and compare their compilation options with yours? I would suggest that this is much more effective than having a whole StackOverflow community shake blindly in the dark until they hit.
EDIT: Do you use a multibyte encoded language? (Note: if you donโt know what this means, then the answer is probably โYesโ, since UTF-8 has been used by default for most Linux distributions for several years, and indeed RedHat (and therefore CentOS) were the first to switch )
In this case, GNU grep is a slow dog. And this applies not only to GNU grep, but to all GNU tools that do some text processing. The FSF refuses to accept any corrections to improve multibyte performance unless these corrections slow down fixed-width encoding. However, since any patch to improve performance for multibyte encodings must contain at least some if , it is actually impossible to write a patch that at least slows down fixed-width encoding, at least the overhead of this if . Thus, GNU UTF-8 tool performance will continue to suck until the end of time.
In any case, most Linux distributions prevent the rat from hearing what the FSF thinks and fixes GNU grep. Fedora Rawhide SRPM contains a patch called grep-2.5.3-egf-speedup.patch , which speeds up UTF-8 GNU grep by several orders of magnitude. (Since this patch is already from 2005, I assume that it is also used on CentOS.) This patch is also used on Mac OSX, Debian, Ubuntu, ... GNU grep distributed by GNU is almost never used. Multibyte-encoded text processing will never be as fast as fixed-width encoding, but it should be at least comparable, not 50x (or even 1500x, as some people say) slower.
There is also another patch called dfa-optional , which makes grep just use the GNU libc regex engine instead of its own, which is not only much faster when working with UTF-8, but also has much less errors.
So, you can re-run your tests with export LC_ALL=POSIX . If this fixes your problem, you need to apply one of the two above fixes.
Additional information is also available in the following two RedHat reports:
The moral of the story: Despite popular belief, Linux distributions know what they do, at least sometimes. Do not think about them.