Custom xpath expression with tics

I am trying to create my own xpath contentHandler for tika that recognizes a complex xpath expression using code from org / apache / tika / sax / BodyContentHandler.java (because I use tika for other things)

This xpath works

/xhtml:html/xhtml:body/descendant:node() 

But it is not

 //xhtml:div[@id='someid']/descendant:node() 

I want to integrate tika contentHandler (because it corrects the content of asymmetric html tags and an invalid character) using the xpath evaluator from javax.xml.xpath. What is the right way to do this. Is there any way to get the original source after tika has rated and committed the html content?

+4
source share
1 answer

The XPath function included with Tika only supports a subset of the XPath functions (see XPathParser for more details). For more complex XPath queries, I recommend using something like javax.xml.xpath .

+2
source

All Articles