KXmlParser throws an Unexpected Token exception at the beginning of the RSS layout

I am trying to parse the RSS feed from Monster on Android v.17 using this url:

http://rss.jobsearch.monster.com/rssquery.ashx?q=java

To get the content, I use HttpUrlConnection as follows

this.conn = (HttpURLConnection) url.openConnection(); this.conn.setConnectTimeout(5000); this.conn.setReadTimeout(10000); this.conn.setUseCaches(true); conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8"); is = new InputStreamReader(url.openStream()); 

What came back, as far as I can tell (and I also confirmed it) legitimate RSS

 Cache-Control:private Connection:Keep-Alive Content-Encoding:gzip Content-Length:5958 Content-Type:text/xml Date:Wed, 06 Mar 2013 17:15:20 GMT P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC Server:Microsoft-IIS/7.5 Vary:Accept-Encoding X-AspNet-Version:2.0.50727 X-Powered-By:ASP.NET 

It starts as follows (click the URL above if you want to see the full XML file):

 <?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>Monster Job Search Results java</title> <description>RSS Feed for Monster Job Search</description> <link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link> 

But when I try to parse it:

 final XmlPullParser xpp = getPullParser(); xpp.setInput(is); for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ } 

The code immediately chokes on type = xpp.next() with the following Exception

 03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException: Unexpected token (position:TEXT @1:2 in java.io.InputStreamReader@414b4538 ) 

Which actually means that it cannot process the second char in line 1 <?xml version="1.0" encoding="utf-8"?>

Here are the offensive lines in KXmlParser.java (425-426). Type == TEXT evaluates to true

 if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) { throw new XmlPullParserException("Unexpected token", this, null); } 

Any help? I tried setting the parser to XmlPullParser.FEATURE_PROCESS_DOCDECL = false , but that didn't help

I researched this on the Internet and here and cannot find anything that helps

+4
source share
1 answer

The reason you get the error is because the xml file does not start with <?xml version="1.0" encoding="utf-8"?> . It starts with three special bytes of EF BB BF , which are Byte order mark .

Hex representation

InputStreamReader does not automatically process these bytes, so you must handle them manually. The easiest way is to use the BOMInpustStream , available in the Commons IO library:

 this.conn = (HttpURLConnection) url.openConnection(); this.conn.setConnectTimeout(5000); this.conn.setReadTimeout(10000); this.conn.setUseCaches(true); conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8"); is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8)); 

I checked the code above and it works well for me.

+34
source

All Articles