You want to create a custom whitelist with a reporting function.
MyReportEnabledWhitelist.java
public class MyReportEnabledWhitelist extends Whitelist { private Set<String> alreadyCheckedAttributeSignatures = new HashSet<>(); @Override protected boolean isSafeTag(String tag) { boolean isSafe = super.isSafeTag(tag); if (!isSafe) { say("Disallowed tag: " + tag); } return isSafe; } @Override protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) { boolean isSafe = super.isSafeAttribute(tagName, el, attr); String signature = el.hashCode() + "-" + attr.hashCode(); if (alreadyCheckedAttributeSignatures.contains(signature) == false) { alreadyCheckedAttributeSignatures.add(signature); if (!isSafe) { say("Wrong attribute: " + attr.getKey() + " (" + attr.html() + ") in " + el.outerHtml()); } } return isSafe; } }
SAMPLE CODE
String html = "<p><a href='ftp://example.com/' onfocus='invalidLink()'>Link</a></p><a href='ftp://example2.com/'>Link 2</a>"; // * Custom whitelist Whitelist myReportEnabledWhitelist = new MyReportEnabledWhitelist() // ** Basic whitelist (from Jsoup) .addTags("a", "b", "blockquote", "br", "cite", "code", "dd", "dl", "dt", "em", "i", "li", "ol", "p", "pre", "q", "small", "span", "strike", "strong", "sub", "sup", "u", "ul") // .addAttributes("a", "href") // .addAttributes("blockquote", "cite") // .addAttributes("q", "cite") // .addProtocols("a", "href", "ftp", "http", "https", "mailto") // .addProtocols("blockquote", "cite", "http", "https") // .addProtocols("cite", "cite", "http", "https") // .addEnforcedAttribute("a", "rel", "nofollow") // // ** Customizations .addTags("body") // .addProtocols("a", "href", "tel", "device") // .removeProtocols("a", "href", "ftp"); boolean safeCustom = Jsoup.isValid(html, myReportEnabledWhitelist); System.out.println(safeCustom);
OUTPUT
Wrong attribute: href (href="ftp://example.com/") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a> Wrong attribute: onfocus (onfocus="invalidLink()") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a> Wrong attribute: href (href="ftp://example2.com/") in <a href="ftp://example2.com/">Link 2</a> false