Coded grep encoding replacement?

I am disappointed that grep does not find a word like β€œhello” in my UTF-16 docs.

Can anyone recommend a version of grep that tries to guess the encoding of the file and then handle it correctly?

+7
grep character-encoding
source share
2 answers

ack how to replace grep based on perl?

You will definitely want to check out ack .

It supports Unicode encodings and mostly grep, but better.

try the corresponding unicode with grep

If you are on Linux, Unix, etc., you may want to change your LANG encoding to fit your documents.

Check your language first. Here is what my default is installed on my MacBook Pro:

  $ locale LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= 

let's say under bash:

 $ LANG="foo" grep 'gotta be found now' file.name 

something a little more persistent (be careful with this):

 $ export LANG="foo" $ grep 'bar' mitz.vah 
+6
source share

Perl has better regex syntax than grep (much more powerful), it supports UTF8 and UTF16, but I'm not sure how good it is at encoding ... if you tell me which encoding to use, however, it can read these files without any problems and run regular expressions over them. You have to write yourself a tiny Perl program for this (your own micro-grep implementation in Perl, so to speak), but it's not that difficult. Perl exists for all major operating systems.

+2
source share

All Articles