I used two methods with a simple time rapper.
Method 1 - Parsing XML Directly from a Link
from lxml import etree as ET @timing def parseXMLFromLink(): parsed = ET.parse(url_link) print parsed.getroot() for n in range(0,100): parseXMLFromLink()
Average value 100 = 98.4035 ms
Method 2 - Parsing XML from a string returned by Urllib2
from lxml import etree as ET import urllib2 @timing def parseXMLFromString(): xml_string = urllib2.urlopen(url_link).read() parsed = ET.fromstring(xml_string) print parsed for n in range(0,100): parseXMLFromString()
Average value 100 = 286.9630 ms
Thus, it seems that using lxml to parse directly from a link is a faster method. It is unclear whether it will load faster and then parse large XML documents from the hard drive, but presumably if the document is not huge and the parsing task is more intensive, the parseXMLFromLink() function will remain faster, since it is urllib2, slow down second function.
I ran this several times and the results remained the same.
Sam p
source share