MalformedByteSequenceException: invalid byte 1 from 1-byte sequence of UTF-8. when using Hebrew characters

Question

MalformedByteSequenceException: invalid byte 1 from 1-byte sequence of UTF-8. when using Hebrew characters

I am trying to parse an XML file containing Hebrew characters. I know that the file is correct, because if I output the file (from other software) without Hebrew characters, it is perfectly parsed.

I tried a lot of things but always get this error

MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

My last attempt was to open it using FileInputStream and specify the encoding

 DocumentBuilder db = dbf.newDocumentBuilder(); document = db.parse(new FileInputStream(new File(xmlFileName)), "Cp1252");

( Cp1252 is the encoding that worked for me in another application) But I got the same result.

Tried to use ByteArray , nothing worked.

Any suggestions?

+4

java xml encoding character-encoding

La bla bla Dec 14 '12 at 14:49

source share

2 answers

The solution is quite simple, get the contents in UTF-8 format and redefine the input source SAX.

 File file = new File("c:\\file-utf.xml"); InputStream inputStream= new FileInputStream(file); Reader reader = new InputStreamReader(inputStream,"UTF-8"); InputSource is = new InputSource(reader); // is.setEncoding("UTF-8"); -> This line causes error! Content is not allowed in prolog saxParser.parse(is, handler);

Here you can read the full example - http://www.mkyong.com/java/how-to-read-utf-8-xml-file-in-java-sax-parser/

0

Raaam Aug 17 '15 at 6:11

source share

jtahlborn · Accepted Answer · 2012-12-14T15:28:25+0000

if you know the correct encoding of the file, not "utf-8", you can either add it to the xml header:

 <?xml version="1.0" encoding="[correct encoding here]" ?>

or analyze it as a reader:

 db.parse(new InputStreamReader(new FileInputStream(new File(xmlFileName)), "[correct encoding here]"));

MalformedByteSequenceException: invalid byte 1 from 1-byte sequence of UTF-8. when using Hebrew characters

More articles: