Identify and delete blank characters on UNIX

I have a text file containing unnecessary null characters (ASCII NUL, \0 ). When I try to view it in vi , I see ^@ characters alternating in plain text. How can I:

  • Determine which lines in the file contain null characters? I tried grepping for \0 and \x0 , but that didn't work.

  • Delete null characters? Running strings in a file cleared it, but I'm just wondering if this is the best way?

+61
null unix shell special-characters
Mar 07 '10 at 23:12
source share
7 answers

Id uses tr :

 tr < file-with-nulls -d '\000' > file-without-nulls 

If you are wondering if input redirection works in the middle of command arguments, it is. Most shells will recognize and handle I / O redirection ( < , > , ...) anywhere on the command line.

+77
Mar 07 '10 at 23:14
source share

Use the following sed command to remove null characters in a file.

 sed -i 's/\x0//g' null.txt 

this solution edits the file in place, it is important if the file is still in use. pass -i'ext 'backs up the source file with the ext suffix added.

+42
Mar 08 '10 at 7:13
source share

A large number of unnecessary NUL characters, say, every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

+14
Mar 07 '10 at 23:16
source share

I found the following that prints which strings, if any, have null characters:

 perl -ne '/\000/ and print;' file-with-nulls 

Additionally, an octal dump can tell you if there are values:

 od file-with-nulls | grep ' 000' 
+5
Mar 08 '10 at 8:08
source share

If the lines in the file end with \ r \ n \ 000, then it works to delete \ n \ 000, and then replace \ r with \ n.

 tr -d '\n\000' <infile | tr '\r' '\n' >outfile 
+5
Nov 24 '15 at 10:41
source share

The following is an example of removing NULL characters using ex (in place):

 ex -s +"%s/\%x00//g" -cwq nulls.txt 

and for several files:

 ex -s +'bufdo!%s/\%x00//g' -cxa *.txt 

For recursion, you can use the **/*.txt substitution option (if supported by your shell).

Useful for scripts, as sed and its -i option are non-standard BSD extensions.

See also: How to check if a file is a binary file and read all files that are not?

+2
May 29 '15 at 23:01
source share

I used:

 recode UTF-16..UTF-8 <filename> 

to get rid of zeros in the file.

+1
Jun 22 '15 at 10:04
source share



All Articles