Force xmllint to ignore bad defult xmlns

I am trying to process a large number of xml files (maven poms) using xmllint --xpath . With some trial and error, I realized that it does not work properly due to a bad declaration of the default namespace in these files, which looks like this:

 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 

A simple command does not work as follows:

 $ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml ) XPath set is empty 

If I get rid of the xmlns attribute by replacing the root element as follows:

 <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 

The previous command gives the expected result:

 $ echo $(xmllint --xpath '/project/modelVersion/text()' pom.xml ) 4.0.0 

Changing hundreds of pom files is not an option, especially since maven itself does not complain.

Is there a way for xmllint process a file with bad xmlns ?

UPDATE

Thanks to Damien, I was able to make some progress:

 $ ( echo setns x=http://maven.apache.org/POM/4.0.0; echo 'xpath /x:project/x:modelVersion/text()'; ) | xmllint --shell pom.xml / > setns x=http://maven.apache.org/POM/4.0.0 / > xpath /x:project/x:modelVersion/text() Object is a Node Set : Set contains 1 nodes: 1 TEXT content=4.0.0 

But this is not quite what I need. My subsequent questions are as follows:

  • Is there a way to print only text? I would like the output to be contained in 4.0.0 in the example above

  • It seems the result is truncated after about 30 characters. Is it possible to get a full exit? This does not happen with xmllint --xpath

+5
source share
2 answers

split the namespace with sed

in pom.xml :

 <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> </project> 

 cat pom.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' - 

returns this:

 <modelVersion>4.0.0</modelVersion> 

if you have funky formatting (for example, the xmlns attributes are in their own lines), first run it through the formatter:

 cat pom.xml | xmllint --format - | sed '2 s/xmlns=".*"//g' | xmllint --xpath '/project/modelVersion' - 
+5
source
 xmllint --xpath "/*[local-name() = 'project']/*[local-name() = 'parent']/*[local-name() = 'version']/text()" pom.xml 

This is not very pretty, but it avoids formatting assumptions and / or reformatting the input pom.xml file.

If for some reason you need to disable "-SNAPSHOT", post the result above with | sed -e "s|-SNAPSHOT||" | sed -e "s|-SNAPSHOT||" .

+1
source

Source: https://habr.com/ru/post/1213221/


All Articles