Parse XML with Python Etree and Return Specified Tag regardless of namespace

I work with some XML data, which in some places in each file overrides the namespace. I am trying to pull all tags of a certain type from a document regardless of the namespace that is active at the point where the tag is in XML.

I use findall('.//{namespace}Tag') to search for the items I'm looking for. But never knowing what {namespace} will be at any given point in the file, it will start or skip whether I will return all the requested tags or not.

Is there a way to return all Tag elements regardless of the {namespace} they fall into? Is there something along the lines of findall('.//{wildcard}Tag') ?

+7
source share
1 answer

The xpath function for lxml supports local-name ()!

Here is an example of Python 3:

 import io from lxml import etree xmlstring = '''<root xmlns:m="http://www.w3.org/html4/" xmlns:n="http://www.w3.org/html5/"> <m:table> <m:tr> <m:name>Sometext</m:name> </m:tr> </m:table> <n:table> <n:name>Othertext</n:name> </n:table> </root>''' root = etree.parse(io.StringIO(xmlstring)) names = root.xpath("//*[local-name() = 'name']") for name in names: print(name.text) 

Your question may have been asked before: xmlparser lxml etree namespace problem

+3
source

All Articles