Parsing an XML string in Python

I have this XML string result, and I need to get the values ​​between the tags. But the XML data type is a string.

final = " <Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table> <Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table> " 

This is my sample code.

 root = ET.fromstring(final) print root 

And this is the error I get:

 xml.parsers.expat.ExpatError: The markup in the document following the root element must be well-formed. 

Ive tried using ET.fromstring. But no luck.

+5
source share
2 answers

Your XML is incorrect. It must have exactly one top-level element. From Wikipedia :

Each XML document has exactly one single root element. It covers all other elements and, therefore, is the only parent element for all other elements. ROOT elements are also called PARENT elements.

Try to wrap it in an additional tag (e.g. Tables ) and parse it with ET:

 xmlData = '''<Tables> <Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table> <Table><Claimable>false</Claimable><MinorRev>71115</MinorRev><Operation>530600 ION MILL</Operation><Experiment>6794</Experiment><HTNum>162</HTNum><WaferEC>71105</WaferEC><HolderType>HACARR</HolderType><Job>16799006</Job></Table> </Tables> ''' import xml.etree.ElementTree as ET xml = ET.fromstring(xmlData) for table in xml.getiterator('Table'): for child in table: print child.tag, child.text 

Since Python 2.7 getiterator('Table') should be replaced with iter('Table') :

 for table in xml.iter('Table'): for child in table: print child.tag, child.text 

This gives:

 Claimable false MinorRev 80601 Operation 530600 ION MILL HTNum 162 WaferEC 80318 HolderType HACARR Job 167187008 Claimable false MinorRev 71115 Operation 530600 ION MILL Experiment 6794 HTNum 162 WaferEC 71105 HolderType HACARR Job 16799006 
+13
source

Perhaps you tried node.attrib , try node.text instead to get a string value (also see XML node.text in Python docs):

 import xml.etree.ElementTree as ET xml_string = "<Table><Claimable>false</Claimable><MinorRev>80601</MinorRev><Operation>530600 ION MILL</Operation><HTNum>162</HTNum><WaferEC>80318</WaferEC><HolderType>HACARR</HolderType><Job>167187008</Job></Table>" root = ET.fromstring(xml_string) for child in root: print child.tag, child.text 

That should give you

 Claimable false MinorRev 80601 Operation 530600 ION MILL HTNum 162 WaferEC 80318 HolderType HACARR Job 167187008 
+2
source

All Articles