Python XML: ParseError: garbage after document element

Trying to parse an XML file in ElementTree:

>>> import xml.etree.cElementTree as ET >>> tree = ET.ElementTree(file='D:\Temp\Slikvideo\JPEG\SV_4_1_mask\index.xml') 

I get the following error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Program Files\Anaconda2\lib\xml\etree\ElementTree.py", line 611, in __init__ self.parse(file) File "<string>", line 38, in parse ParseError: junk after document element: line 3, column 0 

The XML file starts like this:

 <?xml version="1.0" encoding="UTF-8" ?> <Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1" /> <node UID="OBJECT_2016080819041580480127"> <source UID="OBJECT_2016080819041550469454" /> <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" /> <properties file="sicaaa" /> </node> <node UID="OBJECT_2016080819041512769572"> <source UID="OBJECT_2016080819041598947781" /> <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" /> <properties file="ticaaa" /> </node> 

many other nodes follow.

I do not see garbage in row 3, column 0? I suppose there should be another reason for the error.

The XML file is generated by external MITK software , so I assume that everything should be in order.

I work on Win 7, 64 bit, VS2015, Anaconda

+10
python xml
source share
3 answers

As @Matthias Wiehl said, ElementTree expects only one root node and is not well-formed XML that needs to be committed to its source. As a workaround, you can add a fake root node to the document.

 import xml.etree.cElementTree as ET import re with open("index.xml") as f: xml = f.read() tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>") 
+14
source share

The node root of your document ( Version ) is open and closed on line 2. The parser does not expect a single node after the node root. The solution is to remove the slash closure.

+2
source share

Try to restore this document. Close the version element at the end

 <?xml version="1.0" encoding="UTF-8" ?> <Version Writer="E:\d\src\Modules\SceneSerialization\src\mitkSceneIO.cpp" Revision="$Revision: 17055 $" FileVersion="1"> <node UID="OBJECT_2016080819041580480127"> <source UID="OBJECT_2016080819041550469454" /> <data type="LabelSetImage" file="hfbaaa_Bolus.nrrd" /> <properties file="sicaaa" /> </node> <node UID="OBJECT_2016080819041512769572"> <source UID="OBJECT_2016080819041598947781" /> <data type="LabelSetImage" file="ifbaaa_Bolus.nrrd" /> <properties file="ticaaa" /> </node> </Version> 
0
source share

All Articles