You need to use the full QName in your xpath because stdlib ElementTree has no way to register the prefix. I usually use a helper function to create QNames:
def qname(prefix, element, map={'xml':'http://www.w3.org/XML/1998/namespace'}): return "{{{}}}{}".format(map[prefix], element)
The implementation of ElementTree in the standard library does not support XPath enough to do what you want easily. However, the spec for xml:lang indicates that the value of this attribute is inherited by everything that contains it, sort of like xml:base or xmlns namespace declarations. Thus, we can make the language setting explicit for all elements:
xml_lang = qname('xml', 'lang') def set_xml_lang(root, defaultlang=''): xml_lang = qname('xml', 'lang') for item in root: try: lang = item.attrib[xml_lang] except KeyError, err: item.set(xml_lang, defaultlang) lang = defaultlang set_xml_lang(item, lang) set_xml_lang(root) namespaces = {'xml':'http://www.w3.org/XML/1998/namespace'}
If you want to use lxml , your use of "lang" can be much more reliable as it follows the full XPath 1.0 spec. In particular, you can use the lang() function:
import lxml.etree as ET root = ET.fromstring(xml) print root.xpath('//alt[lang("fr")]')
As a bonus, it will have the correct lang() semantics, such as case insensitivity and language skills (for example, lang('en') will be true for xml:lang="en-US" ).
Unfortunately, you cannot use lang() to define the language node. You need to find the first ancestor of xml:lang and use it:
mylang = node.xpath('(ancestor-or-self::*/@xml:lang)[1]')
Putting it all together to combine nodes that don't have a language:
tree.xpath('//alt[not((ancestor-or-self::*/@xml:lang)[1])]')