Python XML Parsing

* Note: lxml will not work on my system. I was hoping to find a solution that is not related to lxml.

I have already gone through some of the documentation, and I am having difficulty making it work as I would like. I would like to parse an XML file that looks like this:

<dict> <key>1375</key> <dict> <key>Key 1</key><integer>1375</integer> <key>Key 2</key><string>Some String</string> <key>Key 3</key><string>Another string</string> <key>Key 4</key><string>Yet another string</string> <key>Key 5</key><string>Strings anyone?</string> </dict> </dict> 

In the file I'm trying to manipulate, there are more "dict" that follow this. I would like to read the XML and output a text / dat file that would look like this:

1375, "Some String", "Another String", "Another String", "Strings any?"

...

Eof

** Initially, I tried to use lxml, but after many attempts to get it working on my system, I switched to using the DOM. Most recently, I tried to use Etree to accomplish this task. Please, for the sake of love for all that is good, will someone help me with this? I am relatively new to Python and would like to know how this works. I thank you in advance.

+8
python xml parsing lxml
source share
2 answers

You can use xml.etree.ElementTree , which is included in Python. There is a built-in companion C-implemented (i.e. Much faster) xml.etree.cElementTree . lxml.etree offers a superset of functionality, but this is not necessary for what you want to do.

The code provided by @Acorn works the same for me (Python 2.7, Windows 7) with each of the following imports:

 import xml.etree.ElementTree as et import xml.etree.cElementTree as et import lxml.etree as et ... tree = et.fromstring(xmltext) ... 

What OS are you using and what installation problems did you have with lxml ?

+10
source share
 import xml.etree.ElementTree as et import csv xmltext = """ <dicts> <key>1375</key> <dict> <key>Key 1</key><integer>1375</integer> <key>Key 2</key><string>Some String</string> <key>Key 3</key><string>Another string</string> <key>Key 4</key><string>Yet another string</string> <key>Key 5</key><string>Strings anyone?</string> </dict> </dicts> """ f = open('output.txt', 'w') writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC) tree = et.fromstring(xmltext) # iterate over the dict elements for dict_el in tree.iterfind('dict'): data = [] # get the text contents of each non-key element for el in dict_el: if el.tag == 'string': data.append(el.text) # if it an integer element convert to int so csv wont quote it elif el.tag == 'integer': data.append(int(el.text)) writer.writerow(data) 
+7
source share

All Articles