Python (newbie) Parse XML from API call

I have been looking for some lessons / other questions about the stack / documentation and still can't figure it out. ugh !!!

Executing an API request and parsing (want to assign to variables, but this is a bonus to this question). This is what I am trying to do. Why can't I provide a title and link for the items?

#!/usr/bin/python # Screen Scraper for Subs import urllib from xml.etree import ElementTree as ET show = 'heroes' season = '4' language = 'en' limit = '1' requestURL = 'http://api.allsubs.org/index.php?' \ + 'search=' + show \ + '+season+' + season \ + '&language=' + language \ + '&limit=' + limit root = ET.parse(urllib.urlopen(requestURL)).getroot() print root print '\n' items = root.findall('items') for item in items: item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]> item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/ 

XML response

  <AllSubsAPI> <title>AllSubs API: Subtitles Search</title> <link>http://www.allsubs.org</link> <description><![CDATA[Subtitles Search for Heroes Season 4]]></description> <language>en-us</language> <results>1</results> <found_results>24</found_results> <items> <item> <title><![CDATA[Heroes Season 4 Subtitles]]></title> <link>http://www.allsubs.org/subs-download/heroes+season+4/1223435/</link> <filename>heroes-season-4-english-heroes-season-4-en.zip</filename> <files_in_archive>Heroes - 4x01-02 - Orientation.HDTV.FQM.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.720p HDTV.DIMENSION.en.srt|Heroes - 4x05 - Hysterical Blindness.720p HDTV.X264.en.srt|Heroes - 4x09 - Shadowboxing.HDTV.LOL.en.srt|Heroes - 4x16 - Pass Fail.HDTV.LOL.en.srt|Heroes - 4x04 - Acceptance.HDTV.en.srt|Heroes - 4x01-02 - Orientation.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.HDTV.NoTV.en.srt|Heroes - 4x10 - Brother Keeper.HDTV.FQM.en.srt|Heroes - 4x04 - Acceptance.HDTV.FQM.en.srt|Heroes - 4x14 - Let It Bleed.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.720p HDTV.SiTV.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.NoTV.en.srt|Heroes - 4x12 - The Fifth Stage.HDTV.LOL.en.srt|Heroes - 4x19 - Brave New World.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.720p HDTV.DIMENSION.en.srt|Heroes - 4x03 - Ink.720p HDTV.DIMENSION.en.srt|Heroes - 4x11 - Thanksgiving.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.HDTV.LOL.en.srt|Heroes - 4x14 - Let It Bleed.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.HDTV.LOL.en.srt|Heroes - 4x12 - The Fifth Stage.720p HDTV.DIMENSION.en.srt|Heroes - 4x18 - The Wall.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.720p HDTV.CTU.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.CTU.en.srt|Heroes - 4x09 - Shadowboxing.720p HDTV.DIMENSION.en.srt|Heroes - 4x10 - Brother Keeper.720p HDTV.DIMENSION.en.srt|Heroes - 4x04 - Acceptance.720p HDTV.CTU.en.srt|Heroes - 4x11 - Thanksgiving.HDTV.FQM.en.srt|Heroes - 4x03 - Ink.HDTV.FQM.en.srt|Heroes - 4x05 - Hysterical Blindness.HDTV.XII.en.srt|</files_in_archive> <languages>en</languages> <added_on>2010-02-16</added_on> </item> </items> </AllSubsAPI> 

UPDATE:

It worked, thanks for the help and pointing out my typo

 items = root.findall('items/item') for item in items: print item.find('title').text print item.find('link').text 
+6
python xml api parsing
source share
3 answers
 items = root.findall('items') 

it should be

 items = root.findall('items/item') 
+4
source share

This works for me. Note. I am using urllib2 for the proxy:

 import urllib2 from xml.etree import ElementTree as ET show = 'heroes' season = '4' language = 'en' limit = '1' requestURL = 'http://api.allsubs.org/index.php?' \ + 'search=' + show \ + '+season+' + season \ + '&language=' + language \ + '&limit=' + limit root = ET.parse(urllib2.urlopen(requestURL)).getroot() print root print '\n' items = root.findall('items')[0].findall('item') for item in items: print item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]> print item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/ 

note that findall ('items') finds the β€œitems” tag, what you want to loop (I think) is the β€œitem” tag in it, so we find all () of them. In addition, you need to print to get something from python.

Also, if I do this with limit = 2, I get a:

 Traceback (most recent call last): File "heros.py", line 18, in <module> root = ET.parse(urllib2.urlopen(requestURL)).getroot() File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse tree.parse(source, parser) File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse parser.feed(data) File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 24, column 95 

I'm not sure that the XML returning from this API is well-formed - there is no "xml" element at the beginning for starters. I would not trust this ...

+3
source share

You do not iterate the items, you actually iterate the items.

I think it should be:

 items = root.findall('items') childItems = items.findall('item') for childItem in childItems: childItem.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]> childItem.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435 
+2
source share

All Articles