Python lxml.etree - is parsing XML from a string or directly from a link more efficient?

Question

Python lxml.etree - is parsing XML from a string or directly from a link more efficient?

With the lxml.etree python framework, is lxml.etree more efficient to parse xml directly from a link into an online xml file, or is it better to use a different structure (e.g. urllib2 ) to return a string and then parse it? Or does it not matter at all?

Method 1 - Analysis directly from the link

 from lxml import etree as ET parsed = ET.parse(url_link)

Method 2 - String Analysis

 from lxml import etree as ET import urllib2 xml_string = urllib2.urlopen(url_link).read() parsed = ET.parse.fromstring(xml_string) # note: I do not have access to python # at the moment, so not sure whether # the .fromstring() function is correct

Or there is a more efficient method than any of them, for example. save xml to .xml file on your desktop and then parse it?

+8

python xml parsing lxml urllib2

Sam p Apr 1 '14 at 18:22

source share

2 answers

If "effective" means "effective", I am sure that you will not see any difference between them at all (if ET.parse(link) fails).

The reason is that network time will be the most important part of parsing an XML file on the Internet, much longer than storing the file on disk or storing it in memory, and much longer than parsing it.

-one

zmbq Apr 1 '14 at 18:24

source share

Sam p · Accepted Answer · 2014-04-01T22:56:34+0000

I used two methods with a simple time rapper.

Method 1 - Parsing XML Directly from a Link

 from lxml import etree as ET @timing def parseXMLFromLink(): parsed = ET.parse(url_link) print parsed.getroot() for n in range(0,100): parseXMLFromLink()

Average value 100 = 98.4035 ms

Method 2 - Parsing XML from a string returned by Urllib2

 from lxml import etree as ET import urllib2 @timing def parseXMLFromString(): xml_string = urllib2.urlopen(url_link).read() parsed = ET.fromstring(xml_string) print parsed for n in range(0,100): parseXMLFromString()

Average value 100 = 286.9630 ms

Thus, it seems that using lxml to parse directly from a link is a faster method. It is unclear whether it will load faster and then parse large XML documents from the hard drive, but presumably if the document is not huge and the parsing task is more intensive, the parseXMLFromLink() function will remain faster, since it is urllib2, slow down second function.

I ran this several times and the results remained the same.

Python lxml.etree - is parsing XML from a string or directly from a link more efficient?

More articles: