How to check if a file is binary?

I wrote the following method to see if a particular file contains only ASCII text characters or in addition to them. Could you take a look at this code, suggest improvements and point out oversights?

The logic is: “If the first 500 bytes of a file contain 5 or more control characters, report it as a binary file”

thank.

public boolean isAsciiText(String fileName) throws IOException {

    InputStream in = new FileInputStream(fileName);
    byte[] bytes = new byte[500];

    in.read(bytes, 0, bytes.length);
    int x = 0;
    short bin = 0;

    for (byte thisByte : bytes) {
        char it = (char) thisByte;
        if (!Character.isWhitespace(it) && Character.isISOControl(it)) {

            bin++;
        }
        if (bin >= 5) {
            return false;
        }
        x++;
    }
    in.close();
    return true;
}
+5
source share
6 answers

Since you call this class "isASCIIText", you know exactly what you are looking for. In other words, this is not "isTextInCurrentLocaleEncoding". So you can be more precise:

if (thisByte < 32 || thisByte > 127) bin++;

, — , , . , "ok" ( , , , , , ), .

+3

x .

, 500 ?

, N , , , , , . 500 ASCII , , .

, , ..

+3
  • , 500 .

  • char it = (char) thisByte; , , .. , = ( Unicode). , , UTF-16.

  • ( IMO) .

+3

, - , finally, . , false.

, ISO? "" , ", 5 ". , , , - isAsciiText , , ( 500 , ) , .

, , (, ), , , , , .
+1
  • , read() , , 500 ?
  • false, .
  • char , 7- ASCII.
0

jdk Linux Solaris. shell- script, bl.

why not check the mime type using some library like jMimeMagic ( http: // http: //sourceforge.net/projects/jmimemagic/ ) and follow the mimetype type, how to process the file,

0
source

All Articles