MalformedByteSequenceException: invalid byte 1 from 1-byte sequence of UTF-8. when using Hebrew characters

I am trying to parse an XML file containing Hebrew characters. I know that the file is correct, because if I output the file (from other software) without Hebrew characters, it is perfectly parsed.

I tried a lot of things but always get this error

MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. 

My last attempt was to open it using FileInputStream and specify the encoding

 DocumentBuilder db = dbf.newDocumentBuilder(); document = db.parse(new FileInputStream(new File(xmlFileName)), "Cp1252"); 

( Cp1252 is the encoding that worked for me in another application) But I got the same result.

Tried to use ByteArray , nothing worked.

Any suggestions?

+4
source share
2 answers

if you know the correct encoding of the file, not "utf-8", you can either add it to the xml header:

 <?xml version="1.0" encoding="[correct encoding here]" ?> 

or analyze it as a reader:

 db.parse(new InputStreamReader(new FileInputStream(new File(xmlFileName)), "[correct encoding here]")); 
+6
source

The solution is quite simple, get the contents in UTF-8 format and redefine the input source SAX.

 File file = new File("c:\\file-utf.xml"); InputStream inputStream= new FileInputStream(file); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); // is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog saxParser.parse(is, handler); 

Here you can read the full example - http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/

0
source

All Articles