How to remove non-ascii char using sed

I want to remove a non-ascii char from some file. Could you help me find the right way to do this.

I have already tried these many regular expressions.

sed -e 's/[\d00-\d128]//g' (not working) cat /bin/mkdir| sed -e 's/[\x00-\x7F]//g' >/tmp/aa but this aa file contains some non-ascii chars. [ root@asssdsada ~]$ hexdump /tmp/aa|more 00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF 00000000 45 4C 46 B0 F0 73 38 C0 - C0 BC BC FF FF 61 61 61 ELF..s8......aaa 00000010 A0 A0 50 E5 74 64 50 57 - 50 57 50 57 D4 D4 51 E5 ..P.tdPWPWPW..Q. 00000020 74 64 6C 69 62 36 34 6C - 64 6C 69 6E 75 78 78 38 tdlib64ldlinuxx8 00000030 36 36 34 73 6F 32 47 4E - 55 42 C8 C0 80 70 69 42 664so2GNUB...piB 00000040 44 47 BA E3 92 43 45 D5 - EC 46 E4 DE D8 71 58 B9 DG...CE..F...qX. 00000050 8D F1 EA D3 EF 4B 86 FC - A9 DA 79 ED 63 B5 51 92 .....K....ycQ 00000060 BA 6C FC D1 69 78 30 ED - 74 F1 73 95 CC 85 D2 46 .l..ix0.ts...F 00000070 A5 B4 6C 67 DA 4A E9 9A - 4B 58 77 A4 37 80 C0 4F ..lg.J..KXw.7..O 00000080 F3 E9 B2 77 65 97 74 F9 - A2 C0 F2 CC 4A 9C 58 A1 ...we.t.....JX 
+4
source share
4 answers

This does not work with sed . Perhaps tr will do?

 tr -d '\200-\377' 

Or with the addition:

 tr -cd '\000-\177' 
+12
source

You tried

 cat /bin/mkdir | tr -cd "[:print:]" 

I think this solves the problem?

If you are only interested in textual content, you can also use

 cat /bin/mkdir | strings 
+6
source

Do you know what encoding this file uses? If so, you can use iconv to convert it. This is a utility for converting from one character encoding to another. Therefore, if the source file is in UTF-8 and you want to convert it to ASCII, you can use the following:

 iconv -f utf8 -t ascii <inputfile> 

The file command in the input file can indicate the current encoding.

Interestingly, there is a command called enca that will do its best to determine the character encoding used if you know the language of the contents of the file.

This other question may be the answer.

+2
source

Try sed -i , for example.

 sed -i 's/[\d128-\d255]//g' MYFILE.txt 

it will replace all non-ascii characters in the file.

0
source

All Articles