How to read a text file without knowing the encoding

When reading a text file that was created somewhere else outside my application, the encoding used is unknown. My application uses NSUnicodeStringEncoding (which is similar to NSUTF16StringEncoding), so there are problems with reading other than UTF16 encoded files.

Is there any way to guess the file encoding? My priority is to be able to read UTF8 files, and then all other files. Is iteration available encodings and checks if the read string length is greater than zero, is this really a good approach?

Thanks in advance.

Ignacio

+7
source share
2 answers

The Apple documentation contains some recommendations on how to proceed: String Programming Guide: reading data with an unknown encoding :

If you are forced to guess the encoding (and note that in the absence of explicit information, this is an assumption):

  • Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or URL-based equivalents). These methods try to determine the encoding of the resource and, if it is possible to return the encoding used by the link.

  • If (1) fails, try reading the resource, specifying UTF-8 as the encoding.

  • If (2) fails, try using the appropriate legacy encoding. "Suitable" here is a little dependent on the circumstances; it could be the default C string encoding, it could be ISO or Windows Latin 1 or something else, depending on where your data comes from.

+8
source

If the file is configured correctly, you can read the first four bytes and see if it is a specification (Byte Order Mark):

http://en.wikipedia.org/wiki/Byte-order_mark

+1
source

All Articles