Jsoup: SelectorParseException when a colon in an xml tag

An exception is thrown when the xml tag has a colon,

An exception:

org.jsoup.select.Selector $ SelectorParseException: failed to parse request 'w: r': unexpected token in ': r'

XML:

<w:r> <w:rPr> <w:rStyle w:val="jid"/> </w:rPr> <w:t>AN</w:t> </w:r> 

Java code:

  org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString); 

Here documentXmlString has xml as above

+8
java xml-parsing jsoup
source share
4 answers

I used

  documentXmlString = documentXmlString.replaceAll("w:","w"); 
+1
source share

Just replace ":" with "|"

 doc.select("w|r"); 

I am using Jsoup 1.5.2.

+17
source share

Although your patchwork worked for you. I would like to give knowledge in the namespace!

w: in your XML is actually called a namespace prefix. And in order to use the neamespace prefix, it must be declared in the root directory of the node! 1 + Since your source XML file is missing a part of the declaration! The parser threw an error! The following is a way to define a namespace in XML! I adjusted your own XML, I am sure that now it will not be a mistake!

 <w:r xmlns:w="http://www.w3.org/SomeNamespace"> <w:rPr> <w:rStyle w:val="jid"/> </w:rPr> <w:t>AN</w:t> </w:r> 

Additional Information:

The namespace has its own scope! in the following example:

 <root> <w:r xmlns:w="http://www.w3.org/SomeNamespace"> <w:rPr> <w:rStyle w:val="jid"/> </w:rPr> <w:t>AN</w:t> </w:r> <someotherElement> <dummychild/> </someotherElement> 

In the above example, you cannot use the namespace prefix on <someotherElement> or <dummychild/> !! because the prefix area w of the namespace to the <r> element and its child (grandson) only!


1+: The element in which the namespace is declared. the namespace will be valid for itself and its child nodes. Declaring a namespace as root makes the namespace valid / accessible for all elements of the XML Document.

+2
source share

JSoup is an html, not an XML parser. For XML, you can use JAXB or SAXON or Xstream.

-one
source share

All Articles