KXmlParser throws an Unexpected Token exception at the beginning of the RSS layout

Question

KXmlParser throws an Unexpected Token exception at the beginning of the RSS layout

I am trying to parse the RSS feed from Monster on Android v.17 using this url:

http://rss.jobsearch.monster.com/rssquery.ashx?q=java

To get the content, I use HttpUrlConnection as follows

this.conn = (HttpURLConnection) url.openConnection(); this.conn.setConnectTimeout(5000); this.conn.setReadTimeout(10000); this.conn.setUseCaches(true); conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8"); is = new InputStreamReader(url.openStream());

What came back, as far as I can tell (and I also confirmed it) legitimate RSS

 Cache-Control:private Connection:Keep-Alive Content-Encoding:gzip Content-Length:5958 Content-Type:text/xml Date:Wed, 06 Mar 2013 17:15:20 GMT P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC Server:Microsoft-IIS/7.5 Vary:Accept-Encoding X-AspNet-Version:2.0.50727 X-Powered-By:ASP.NET

It starts as follows (click the URL above if you want to see the full XML file):

 <?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>Monster Job Search Results java</title> <description>RSS Feed for Monster Job Search</description> <link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>

But when I try to parse it:

 final XmlPullParser xpp = getPullParser(); xpp.setInput(is); for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }

The code immediately chokes on type = xpp.next() with the following Exception

 03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException: Unexpected token (position:TEXT @1:2 in java.io.InputStreamReader@414b4538 )

Which actually means that it cannot process the second char in line 1 <?xml version="1.0" encoding="utf-8"?>

Here are the offensive lines in KXmlParser.java (425-426). Type == TEXT evaluates to true

 if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) { throw new XmlPullParserException("Unexpected token", this, null); }

Any help? I tried setting the parser to XmlPullParser.FEATURE_PROCESS_DOCDECL = false , but that didn't help

I researched this on the Internet and here and cannot find anything that helps

+4

android rss xmlpullparser

Bostone Mar 6 '13 at 17:31

source share

1 answer

Vladimir Mironov · Accepted Answer · 2013-03-10T05:59:42+0000

The reason you get the error is because the xml file does not start with <?xml version="1.0" encoding="utf-8"?> . It starts with three special bytes of EF BB BF , which are Byte order mark .

InputStreamReader does not automatically process these bytes, so you must handle them manually. The easiest way is to use the BOMInpustStream , available in the Commons IO library:

 this.conn = (HttpURLConnection) url.openConnection(); this.conn.setConnectTimeout(5000); this.conn.setReadTimeout(10000); this.conn.setUseCaches(true); conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8"); is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));

I checked the code above and it works well for me.

KXmlParser throws an Unexpected Token exception at the beginning of the RSS layout

More articles: