I plan to use owasp-java-html-sanitizer to perform several tasks in a custom html file.
I would like to extract a list of urls from an html string.
I would also like to make sure that all links have a target value set to "_blank", it looks like the configuration of HtmlPolicyBuilder.requireRelNofollowOnLinks . (Done)
PolicyFactory linkRewrite = new HtmlPolicyBuilder().allowAttributes("href").onElements("a") .requireRelNofollowOnLinks().allowElements(new ElementPolicy() { public String apply(String elementName, List<String> attrs) { attrs.add("target"); attrs.add("_blank"); return "a"; } }, "a").toFactory();
This adds target="_blank" to the links, not sure if this is the best way to accomplish it.
It also retrieves the urls:
.allowElements(new ElementPolicy() { public String apply(String elementName, List<String> attrs) { for (int i = 0, n = attrs.size(); i < n; i += 2) { if ("href".equals(attrs.get(i))) { urls.add(attrs.get(i + 1)); break; } } attrs.add("target"); attrs.add("_blank"); return elementName; } }, "a")
source share