How to generate an XPath query matching a specific item in Jsoup?

_ Hello, this is my web page:

<html>
    <head>
    </head>
    <body>
        <div> text div 1</div>
        <div>
            <span>text of first span </span>
            <span>text of second span </span>
        </div>
        <div> text div 3 </div>
    </body>
</html>

I use jsoup to analyze it, and then go through all the elements inside the page and get their paths:

 Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\index.html"), "UTF-8");
 Elements elements = doc.body().select("*");
ArrayList all = new ArrayList();
        for (Element element : elements) {
            if (!element.ownText().isEmpty()) {

                StringBuilder path = new StringBuilder(element.nodeName());
                String value = element.ownText();
                Elements p_el = element.parents();

                for (Element el : p_el) {
                    path.insert(0, el.nodeName() + '/');
                }
                all.add(path + " = " + value + "\n");
                System.out.println(path +" = "+ value);
            }
        }

        return all;

my code gives me this result:

html/body/div = text div 1
html/body/div/span = text of first span
html/body/div/span = text of second span
html/body/div = text div 3

in fact, I want to get the result as follows:

html/body/div[1] = text div 1
html/body/div[2]/span[1] = text of first span
html/body/div[2]/span[2] = text of second span
html/body/div[3] = text div 3

please can someone give me an idea how to achieve this result :). thank you in advance.

+4
source share
3 answers

How the idea was asked here. Even if I'm sure there are better solutions for getting xpath for a given node. For example, use xslt, as in the answer, to "Create / get xpath from XML node java".

, .

() , . : if ( count (el.select('../' + el.nodeName() ) > 1)
true, preceding-sibling:: 1.
count (el.select('preceding-sibling::' + el.nodeName() ) +1

+2

:

StringBuilder absPath=new StringBuilder();
Elements parents = htmlElement.parents();

for (int j = parents.size()-1; j >= 0; j--) {
    Element element = parents.get(j);
    absPath.append("/");
    absPath.append(element.tagName());
    absPath.append("[");
    absPath.append(element.siblingIndex());
    absPath.append("]");
}
+1

It would be easier if you crossed the document from root to leaves, and not vice versa. This way you can easily group items by tag name and handle multiple cases accordingly. Recursive approach:

private final List<String> path = new ArrayList<>();
private final List<String> all = new ArrayList<>();

public List<String> getAll() {
    return Collections.unmodifiableList(all);
}

public void parse(Document doc) {
    path.clear();
    all.clear();
    parse(doc.children());
}

private void parse(List<Element> elements) {
    if (elements.isEmpty()) {
        return;
    }
    Map<String, List<Element>> grouped = elements.stream().collect(Collectors.groupingBy(Element::tagName));

    for (Map.Entry<String, List<Element>> entry : grouped.entrySet()) {
        List<Element> list = entry.getValue();
        String key = entry.getKey();
        if (list.size() > 1) {
            int index = 1;
            // use paths with index
            key += "[";
            for (Element e : list) {
                path.add(key + (index++) + "]");
                handleElement(e);
                path.remove(path.size() - 1);
            }
        } else {
            // use paths without index
            path.add(key);
            handleElement(list.get(0));
            path.remove(path.size() - 1);
        }
    }

}

private void handleElement(Element e) {
    String value = e.ownText();
    if (!value.isEmpty()) {
        // add entry
        all.add(path.stream().collect(Collectors.joining("/")) + " = " + value);
    }
    // process children of element
    parse(e.children());
}
0
source

All Articles