Coded grep encoding replacement?

Question

Coded grep encoding replacement?

I am disappointed that grep does not find a word like “hello” in my UTF-16 docs.

Can anyone recommend a version of grep that tries to guess the encoding of the file and then handle it correctly?

+7

grep character-encoding

fish Mar 05 '09 at 0:11

source share

2 answers

popcnt · Answer 1 · 2009-03-05T03:16:54+0000

ack how to replace grep based on perl?

You will definitely want to check out ack .

It supports Unicode encodings and mostly grep, but better.

try the corresponding unicode with grep

If you are on Linux, Unix, etc., you may want to change your LANG encoding to fit your documents.

Check your language first. Here is what my default is installed on my MacBook Pro:

  $ locale LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL=

let's say under bash:

 $ LANG="foo" grep 'gotta be found now' file.name

something a little more persistent (be careful with this):

 $ export LANG="foo" $ grep 'bar' mitz.vah

Mecki · Answer 2 · 2009-03-05T00:26:49+0000

Perl has better regex syntax than grep (much more powerful), it supports UTF8 and UTF16, but I'm not sure how good it is at encoding ... if you tell me which encoding to use, however, it can read these files without any problems and run regular expressions over them. You have to write yourself a tiny Perl program for this (your own micro-grep implementation in Perl, so to speak), but it's not that difficult. Perl exists for all major operating systems.

Coded grep encoding replacement?

ack how to replace grep based on perl?

try the corresponding unicode with grep

More articles: