Reading Maven Pom xml in Python

I have a pom file that has the following meanings:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.welsh</groupId> <artifactId>my-site</artifactId> <version>1.0.0</version> <packaging>pom</packaging> <profiles> <profile> <build> <plugins> <plugin> <groupId>org.welsh.utils</groupId> <artifactId>site-tool</artifactId> <version>1.0</version> <executions> <execution> <configuration> <mappings> <property> <name>homepage</name> <value>/content/homepage</value> </property> <property> <name>assets</name> <value>/content/assets</value> </property> </mappings> </configuration> </execution> </executions> </plugin> </plugins> </build> </profile> </profiles> </project> 

And I'm looking to create a dictionary from the name and value elements under the property under the mappings element.

So, what I'm trying to understand is how to get all possible mappings (Incase elements from several build profiles), so I can get all property elements under it and read about XPath Supported Syntax, the following should print all possible text / value elements:

 import xml.etree.ElementTree as xml pomFile = xml.parse('pom.xml') root = pomFile.getroot() for mapping in root.findall('*/mappings'): for prop in mapping.findall('.//property'): logging.info(prop.find('name').text + " => " + prop.find('value').text) 

That returns nothing. I tried just printing all the mappings elements and getting:

 >>> print root.findall('*/mappings') [] 

And when I print everything from root , I get:

 >>> print root.findall('*') [<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x10b38bd50>, <Element '{http://maven.apache.org/POM/4.0.0}groupId' at 0x10b38bd90>, <Element '{http://maven.apache.org/POM/4.0.0}artifactId' at 0x10b38bf10>, <Element '{http://maven.apache.org/POM/4.0.0}version' at 0x10b3900d0>, <Element '{http://maven.apache.org/POM/4.0.0}packaging' at 0x10b390110>, <Element '{http://maven.apache.org/POM/4.0.0}name' at 0x10b390150>, <Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x10b390190>, <Element '{http://maven.apache.org/POM/4.0.0}build' at 0x10b390310>, <Element '{http://maven.apache.org/POM/4.0.0}profiles' at 0x10b390390>] 

Because of what I tried to print:

 >>> print root.findall('*/{http://maven.apache.org/POM/4.0.0}mappings') [] 

But that doesn't work either.

Any suggestions would be great.

Thanks,

+8
python xml xml-parsing
source share
2 answers

Well, it turned out that when I remove maven from the project element, so its just <project> I can do this:

 for mapping in root.findall('*//mappings'): logging.info(mapping) for prop in mapping.findall('./property'): logging.info(prop.find('name').text + " => " + prop.find('value').text) 

This will lead to:

 INFO:root:<Element 'mappings' at 0x10d72d350> INFO:root:homepage => /content/homepage INFO:root:assets => /content/assets 

However, if I leave the Maven stuff at the top, I can do this:

 for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'): logging.info(mapping) for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'): logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text) 

Result:

 INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310> INFO:root:homepage => /content/homepage INFO:root:assets => /content/assets 

However, I would like to know how to avoid accounting for maven material, since it blocks me in this format.

EDIT:

Ok, I managed to get something more verbose:

 import xml.etree.ElementTree as xml def getMappingsNode(node, nodeName): if node.findall('*'): for n in node.findall('*'): if nodeName in n.tag: return n else: return getMappingsNode(n, nodeName) def getMappings(rootNode): mappingsNode = getMappingsNode(rootNode, 'mappings') mapping = {} for prop in mappingsNode.findall('*'): key = '' val = '' for child in prop.findall('*'): if 'name' in child.tag: key = child.text if 'value' in child.tag: val = child.text if val and key: mapping[key] = val return mapping pomFile = xml.parse('pom.xml') root = pomFile.getroot() mappings = getMappings(root) print mappings 
+4
source share

I modified pom.xml with python. Etree doesn't seem to be documented very well. It took a while to get everything working, but it seems to be working now.


As you can see in the following snippet, Maven uses the namespace http://maven.apache.org/POM/4.0.0 . The xmlns attribute in the root node directory defines the default namespace. The xmlns:xsi attribute also defines a namespace, but it is used only for xsi:schemaLocation .

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 

To use tags of type profile in methods such as find , you must also specify a namespace. For example, you can write the following to find all profile -tags.

 import xml.etree as xml pom = xml.parse('pom.xml') for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'): print(repr(profile)) 

Another important thing is // here. Using your xml aboive file, */ will have the same result for this example. But for other tags, such as mappings , it would not work . Since * represents only one level, */child can be expanded to parent/tag or xyz/tag , but not to xyz/parent/tag .


I believe these are the main problems in your code above. You must use // insted from */ to allow any subitems instead of direct children. And you must specify a namespace. Using this, you can do something similar to find all the mappings:

 pom = xml.parse('pom.xml') map = {} for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings' '/{http://maven.apache.org/POM/4.0.0}property'): name = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text map[name] = value 

But specifying namespaces like the ones above is not very good. You can define a namespace map and pass it as the second argument to find and findall :

 # ... nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'} for mapping in pom.findall('//m:mappings/m:property', nsmap): name = mapping.find('m:name', nsmap).text value = mapping.find('m:value', nsmap).text map[name] = value 
+1
source share

All Articles