Parse invalid ampersands using Android XmlPullParsers

I am writing a small screenshot application that consumes some XHTML - it goes without saying that XHTML is not valid: ampersands are not escaped like & .

I am using Android XmlPullParser and it is XmlPullParser following error with an incorrectly encoded value:

 org.xmlpull.v1.XmlPullParserException: unterminated entity ref (position:START_TAG <a href='/Fahrinfo/bin/query.bin/dox?ld=0.1&n=3&i=9c.0323581.1266265347&rt=0&vcra'> @55:134 in java.io.InputStreamReader@43b1ef70) 

How do I get around this? I thought of the following solutions:

  • A wrapper around an InputStream in another that replaces ampersands with refs
  • Configuring the analyzer so that it magically accepts incorrect markup

Which ones are likely to be more successful?

+7
android xml-parsing
source share
2 answers

I would go with your first option, replacing the ampersands, it seems, with a more suitable solution than the other. The second option seems more like a hack to get it working by accepting the wrong markup.

+2
source share

I was stuck with this for about an hour before finding out that in my case it was "&". which cannot be resolved by XML PULL PARSER, so I found a solution. So, here is a piece of code that completely fixes it.

 void ParsingActivity(String r) { try { parserCreator = XmlPullParserFactory.newInstance(); parser = parserCreator.newPullParser(); // Here we give our file object in the form of a stream to the // parser. parser.setInput(new StringReader(r.replaceAll("&", "&amp;"))); // as a SAX parser this will raise events/callback as and when it // comes to a element. int parserEvent = parser.getEventType(); // we go thru a loop of all elements in the xml till we have // reached END of document. while (parserEvent != XmlPullParser.END_DOCUMENT) { switch (parserEvent) { // if u have reached start of a tag case XmlPullParser.START_TAG: // get the name of the tag String tag = parser.getName(); 

To a large extent, what I am doing is simply replacing & with &amp; since I was dealing with URL parsing. Hope this helps.

+6
source share

All Articles