Saving XML using ETree in Python. It does not save namespaces and adds ns0, ns1 and removes xmlns tags

I see that there are similar questions, but nothing that completely helped me. I also looked through the official documentation on namespaces, but can't find anything that really helps me, maybe I'm too new to XML formatting. I understand that maybe I need to create my own namespace dictionary? Anyway, here is my situation:

I get the result from an API call, it gives me XML, which is stored as a string in my Python application.

What I'm trying to accomplish is just to capture this XML, change the slightest value (value b: string value ConditionValue / Default, but this is not relevant to this issue) and then save it as a string to send later in the Rest POST call.

The source XML is as follows:

<Context xmlns="http://Test.the.Sdk/2010/07" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"> <xmlns i:nil="true" xmlns="http://schema.test.org/2004/07/Test.Soa.Vocab" xmlns:a="http://schema.test.org/2004/07/System.Xml.Serialize"/> <Conditions xmlns:a="http://schema.test.org/2004/07/Test.Soa.Vocab"> <a:Condition> <a:xmlns i:nil="true" xmlns:b="http://schema.test.org/2004/07/System.Xml.Serialize"/> <Identifier>a23aacaf-9b6b-424f-92bb-5ab71505e3bc</Identifier> <Name>Code</Name> <ParameterSelections/> <ParameterSetCollections/> <Parameters/> <Summary i:nil="true"/> <Instance>25486d6c-36ba-4ab2-9fa6-0dbafbcf0389</Instance> <ConditionValue> <ComplexValue i:nil="true"/> <Text i:nil="true" xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays"/> <Default> <ComplexValue i:nil="true"/> <Text xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays"> <b:string>NULLCODE</b:string> </Text> </Default> </ConditionValue> <TypeCode>String</TypeCode> </a:Condition> <a:Condition> <a:xmlns i:nil="true" xmlns:b="http://schema.test.org/2004/07/System.Xml.Serialize"/> <Identifier>0af860f6-5611-4a23-96dc-eb3863975529</Identifier> <Name>Content Type</Name> <ParameterSelections/> <ParameterSetCollections/> <Parameters/> <Summary i:nil="true"/> <Instance>6364ec20-306a-4cab-aabc-8ec65c0903c9</Instance> <ConditionValue> <ComplexValue i:nil="true"/> <Text i:nil="true" xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays"/> <Default> <ComplexValue i:nil="true"/> <Text xmlns:b="http://schemas.microsoft.com/2003/10/Serialization/Arrays"> <b:string>Standard</b:string> </Text> </Default> </ConditionValue> <TypeCode>String</TypeCode> </a:Condition> </Conditions> 

My task is to replace one of the values, preserving the entire source structure, and use it to send POST later in the application.

The problem I am facing is that when it saves a line or file, it completely fills the namespaces:

 <ns0:Context xmlns:ns0="http://Test.the.Sdk/2010/07" xmlns:ns1="http://schema.test.org/2004/07/Test.Soa.Vocab" xmlns:ns3="http://schemas.microsoft.com/2003/10/Serialization/Arrays" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <ns1:xmlns xsi:nil="true" /> <ns0:Conditions> <ns1:Condition> <ns1:xmlns xsi:nil="true" /> <ns0:Identifier>a23aacaf-9b6b-424f-92bb-5ab71505e3bc</ns0:Identifier> <ns0:Name>Code</ns0:Name> <ns0:ParameterSelections /> <ns0:ParameterSetCollections /> <ns0:Parameters /> <ns0:Summary xsi:nil="true" /> <ns0:Instance>25486d6c-36ba-4ab2-9fa6-0dbafbcf0389</ns0:Instance> <ns0:ConditionValue> <ns0:ComplexValue xsi:nil="true" /> <ns0:Text xsi:nil="true" /> <ns0:Default> <ns0:ComplexValue xsi:nil="true" /> <ns0:Text> <ns3:string>NULLCODE</ns3:string> </ns0:Text> </ns0:Default> </ns0:ConditionValue> <ns0:TypeCode>String</ns0:TypeCode> </ns1:Condition> <ns1:Condition> <ns1:xmlns xsi:nil="true" /> <ns0:Identifier>0af860f6-5611-4a23-96dc-eb3863975529</ns0:Identifier> <ns0:Name>Content Type</ns0:Name> <ns0:ParameterSelections /> <ns0:ParameterSetCollections /> <ns0:Parameters /> <ns0:Summary xsi:nil="true" /> <ns0:Instance>6364ec20-306a-4cab-aabc-8ec65c0903c9</ns0:Instance> <ns0:ConditionValue> <ns0:ComplexValue xsi:nil="true" /> <ns0:Text xsi:nil="true" /> <ns0:Default> <ns0:ComplexValue xsi:nil="true" /> <ns0:Text> <ns3:string>Standard</ns3:string> </ns0:Text> </ns0:Default> </ns0:ConditionValue> <ns0:TypeCode>String</ns0:TypeCode> </ns1:Condition> </ns0:Conditions> 

I narrowed the code down to the most basic form, and I still get the same results, so it has nothing to do with the way I usually manipulate the file:

 import xml.etree.ElementTree as ET import requests get_context_xml = 'http://localhost/testapi/returnxml' #returns first XML example above. source_context_xml = requests.get(get_context_xml) Tree = ET.fromstring(source_context_xml) #Ensure the original namespaces are intact. for Conditions in Tree.iter('{http://schema.test.org/2004/07/Test.Soa.Vocab}Condition'): print "success" with open('/home/memyself/output.xml','w') as f: f.write(ET.tostring(Tree)) 
+5
source share
2 answers

You need to register the prefix and namespace before doing fromstring() (reading xml) to avoid the default namespace prefixes (e.g. ns0 and ns1 , etc.).

For this you can use the function ET.register_namespace() , an example is

 ET.register_namespace('<prefix>','http://Test.the.Sdk/2010/07') ET.register_namespace('a','http://schema.test.org/2004/07/Test.Soa.Vocab') 

You can leave <prefix> empty if you do not need a prefix.


Example / Demo -

 >>> r = ET.fromstring('<a xmlns="blah">a</a>') >>> ET.tostring(r) b'<ns0:a xmlns:ns0="blah">a</ns0:a>' >>> ET.register_namespace('','blah') >>> r = ET.fromstring('<a xmlns="blah">a</a>') >>> ET.tostring(r) b'<a xmlns="blah">a</a>' 
+8
source

First of all, welcome to the StackOverflow network! Technically, @ anand-s-kumar is correct. However, there was a slight misuse of the toString function and the fact that namespaces may not always be known by code or between tags or XML files. In addition, inconsistencies between the lxml and xml.etree and Python 2.x and 3.x libraries make processing difficult.

This function iterates over all the children of the XML tree that are passed in, and then edits the XML tags to remove the namespaces. Please note that some data may be lost .

 def remove_namespaces(tree): for el in tree.getiterator(): match = re.match("^(?:\{.*?\})?(.*)$", el.tag) if match: el.tag = match.group(1) 

I myself just ran into this problem and hacked a quick fix. I checked this with approximately 81,000 XML files (on average about 150 MB each) that had this problem and they were all fixed. Please note that this is not an optimal solution, but it is relatively efficient and works fine for me.

CREDIT: Idea and code structure from Jochen Kupperschmidt .

0
source

All Articles