Editing XML texts from an XML file using Python
I have an XML file that contains some data.
<?xml version="1.0" encoding="UTF-8" ?> - <ParameterData> <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> - <ParameterList count="85"> - <Parameter name="Spec 2 Included" type="boolean" mode="both"> <Value>n/a</Value> <Result>n/a</Result> </Parameter> - <Parameter name="Spec 2 Label" type="string" mode="both"> <Value>n/a</Value> <Result>n/a</Result> </Parameter> - <Parameter name="Spec 3 Included" type="boolean" mode="both"> <Value>n/a</Value> <Result>n/a</Result> </Parameter> - <Parameter name="Spec 3 Label" type="string" mode="both"> <Value>n/a</Value> <Result>n/a</Result> </Parameter> </ParameterList> </ParameterData> I have one text file with lines like
Spec 2 Included : TRUE Spec 2 Label: 19-Flat2-HS3 Spec 3 Included : FALSE Spec 3 Label: 4-1-Bead1-HS3 Now I want to edit XML texts; I, e. I want to replace the (n / a) field with the corresponding values ββfrom a text file. How do I want the file to look like
<?xml version="1.0" encoding="UTF-8" ?> - <ParameterData> <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> - <ParameterList count="85"> - <Parameter name="Spec 2 Included" type="boolean" mode="both"> <Value>TRUE</Value> <Result>TRUE</Result> </Parameter> - <Parameter name="Spec 2 Label" type="string" mode="both"> <Value>19-Flat2-HS3</Value> <Result>19-Flat2-HS3</Result> </Parameter> - <Parameter name="Spec 3 Included" type="boolean" mode="both"> <Value>FALSE</Value> <Result>FALSE</Result> </Parameter> - <Parameter name="Spec 3 Label" type="string" mode="both"> <Value>4-1-Bead1-HS3</Value> <Result>4-1-Bead1-HS3</Result> </Parameter> </ParameterList> </ParameterData> I am new to this Python-XML coding. I have no idea how to edit text fields in an XML file. I am trying to use the elementtree.ElementTree module. but read the lines in the XML file and extract the attributes that I don't know which modules need to be imported.
Please, help.
Thanks and respect.
You can convert data text to python dictionary with regular expression
data="""Spec 2 Included : TRUE Spec 2 Label: 19-Flat2-HS3 Spec 3 Included : FALSE Spec 3 Label: 4-1-Bead1-HS3""" #data=open("data.txt").read() import re data=dict(re.findall('(Spec \d+ (?:Included|Label))\s*:\s*(\S+)',data)) data will be as follows
{'Spec 3 Included': 'FALSE', 'Spec 2 Included': 'TRUE', 'Spec 3 Label': '4-1-Bead1-HS3', 'Spec 2 Label': '19-Flat2-HS3'} Then you can convert it using any of your xml parsers, I will use the mini-disk here.
from xml.dom import minidom dom = minidom.parseString(xml_text) params=dom.getElementsByTagName("Parameter") for param in params: name=param.getAttribute("name") if name in data: for item in param.getElementsByTagName("*"): # You may change to "Result" or "Value" only item.firstChild.replaceWholeText(data[name]) print dom.toxml() #write to file open("output.xml","wb").write(dom.toxml()) results
<?xml version="1.0" ?><ParameterData> <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj"/> <ParameterList count="85"> <Parameter mode="both" name="Spec 2 Included" type="boolean"> <Value>TRUE</Value> <Result>TRUE</Result> </Parameter> <Parameter mode="both" name="Spec 2 Label" type="string"> <Value>19-Flat2-HS3</Value> <Result>19-Flat2-HS3</Result> </Parameter> <Parameter mode="both" name="Spec 3 Included" type="boolean"> <Value>FALSE</Value> <Result>FALSE</Result> </Parameter> <Parameter mode="both" name="Spec 3 Label" type="string"> <Value>4-1-Bead1-HS3</Value> <Result>4-1-Bead1-HS3</Result> </Parameter> </ParameterList> </ParameterData> Well, you can start with
import xml.etree.ElementTree as ET tree = ET.parse("blah.xml") Find the items you want to change.
To replace the contents of an element, simply
element.text = "TRUE" The import operation above works in Python 2.5 or later. If you have an older version of Python installed, you need to install ElementTree as an extension, and then the import statement is different: import elementtree.ElementTree as ET .
Unfortunately, XPath supported by ElementTree is not complete. Since Python 2.6 includes an older version, searching for elements by attribute (as indicated here ) does not work. Therefore, Python's own documentation should be your first stop: xml.etree.ElementTree
import xml.etree.ElementTree as ET original = ET.parse("original.xml") parameters = original.findall(".//Parameter") changes = {} # read changes with open("changes.txt", "rb") as in_file: for change in in_file: change = change.rstrip() # remove line endings name, value = change.split(":") changes[name.strip()] = value.strip() # remove whitespaces # find paramter element and apply changes for parameter in parameters: parameter_name = parameter.get("name") if changes.has_key(parameter_name): value = parameter.find("./Value") value.text = changes[parameter_name] result = parameter.find("./Result") result.text = changes[parameter_name] original.write("new.xml") Here's how you could do it using Amara
from amara import bindery doc = bindery.parse(XML) def cleanup_for_dict(key, value): return key.strip(), value.strip() params = dict(( cleanup_for_dict(*line.split(':', 1)) for line in TEXT.splitlines())) for param in doc.ParameterData.ParameterList.Parameter: if param.name in params: param.Value = params[param.name] param.Result = params[param.name] doc.xml_write()