How to get Python XML to stop lost child nodes

I have a simple XML document that I am trying to read with the Python DOM (see below):

XML file:

<?xml version="1.0" encoding="utf-8"?> <HeaderLookup> <Header> <Reserved>2</Reserved> <CPU>1</CPU> <Flag>1</Flag> <VQI>12</VQI> <Group_ID>16</Group_ID> <DI>2</DI> <DE>1</DE> <ACOSS>5</ACOSS> <RGH>8</RGH> </Header> </HeaderLookup> 

Python Code:

 from xml.dom import minidom xml_file = open("test.xml") xmlroot = minidom.parse(xml_file).documentElement xml_file.close() for item in xmlroot.getElementsByTagName("Header")[0].childNodes: print item 

Result:

 <DOM Text node "u'\n\t\t'"> <DOM Element: Reserved at 0x28d2828> <DOM Text node "u'\n\t\t'"> <DOM Element: CPU at 0x28d28c8> <DOM Text node "u'\n\t\t'"> <DOM Element: Flag at 0x28d2968> <DOM Text node "u'\n\t\t'"> <DOM Element: VQI at 0x28d2a08> <DOM Text node "u'\n\t\t'"> <DOM Element: Group_ID at 0x28d2ad0> <DOM Text node "u'\n\t\t'"> <DOM Element: DI at 0x28d2b70> <DOM Text node "u'\n\t\t'"> <DOM Element: DE at 0x28d2c10> <DOM Text node "u'\n\t\t'"> <DOM Element: ACOSS at 0x28d2cb0> <DOM Text node "u'\n\t\t'"> <DOM Element: RGH at 0x28d2d50> <DOM Text node "u'\n\t'"> 

As a result, there should be 9 child nodes (reserved, CPU, flag, VQI, Group_ID, DI, DE, ACOSS and RGH), but for some reason it returns a list of 19 nodes, of which 10 of them are spaces (why is this even considered node in the first place ?!). Can someone tell me if there is a way to make the XML parser not include whitespace?

+7
source share
1 answer

Space is important in XML, but check out ElementTree , which has a different XML processing API than the DOM.

Example

 from xml.etree import ElementTree as et data = '''\ <?xml version="1.0" encoding="utf-8"?> <HeaderLookup> <Header> <Reserved>2</Reserved> <CPU>1</CPU> <Flag>1</Flag> <VQI>12</VQI> <Group_ID>16</Group_ID> <DI>2</DI> <DE>1</DE> <ACOSS>5</ACOSS> <RGH>8</RGH> </Header> </HeaderLookup> ''' tree = et.fromstring(data) for n in tree.find('Header'): print n.tag,'=',n.text 

Exit

 Reserved = 2 CPU = 1 Flag = 1 VQI = 12 Group_ID = 16 DI = 2 DE = 1 ACOSS = 5 RGH = 8 

Example (extension of the previous code)

A space is still present, but it is in the .tail attributes. tail is the text of the node that follows the element (between the end of one element and the beginning of the next), and text is the text of the node between the start / end tag of the element.

 def dump(e): print '<%s>' % e.tag print 'text =',repr(e.text) for n in e: dump(n) print '</%s>' % e.tag print 'tail =',repr(e.tail) dump(tree) 

Exit

 <HeaderLookup> text = '\n ' <Header> text = '\n ' <Reserved> text = '2' </Reserved> tail = '\n ' <CPU> text = '1' </CPU> tail = '\n ' <Flag> text = '1' </Flag> tail = '\n ' <VQI> text = '12' </VQI> tail = '\n ' <Group_ID> text = '16' </Group_ID> tail = '\n ' <DI> text = '2' </DI> tail = '\n ' <DE> text = '1' </DE> tail = '\n ' <ACOSS> text = '5' </ACOSS> tail = '\n ' <RGH> text = '8' </RGH> tail = '\n ' </Header> tail = '\n' </HeaderLookup> tail = None 
+9
source

All Articles