The cause of the error is Jsoup.isValid

I have the following code that works, but I just want to know if it is possible in Jsoup to determine the exact cause of the error.

The following returns true (as expected)

private void validateProtocol() { String html = "<p><a href='https://example.com/'>Link</a></p>"; Whitelist whiteList = Whitelist.basic(); whiteList.addProtocols("a","href","tel"); whiteList.removeProtocols("a","href","ftp"); boolean safe = Jsoup.isValid(html, whiteList); System.out.println(safe); } 

Changing the above line returns false (as expected)

 String html = "<p><a href='ftp://example.com/'>Link</a></p>"; 

Now that I have the following code, there are two errors: one is the wrong protocol, and the other is the onfocus () link.

 private void validateProtocol() { String html = "<p><a href='ftp://example.com/' onfocus='invalidLink()'>Link</a></p>"; Whitelist whiteList = Whitelist.basic(); whiteList.addProtocols("a","href","tel", "device"); whiteList.removeProtocols("a","href","ftp"); boolean safe = Jsoup.isValid(html, whiteList); System.out.println(safe); } 

The result is incorrect, but is there any way to find out which part of the URL is false? for example, the wrong protocol or the wrong method.?

+5
source share
1 answer

You want to create a custom whitelist with a reporting function.

MyReportEnabledWhitelist.java

 public class MyReportEnabledWhitelist extends Whitelist { private Set<String> alreadyCheckedAttributeSignatures = new HashSet<>(); @Override protected boolean isSafeTag(String tag) { boolean isSafe = super.isSafeTag(tag); if (!isSafe) { say("Disallowed tag: " + tag); } return isSafe; } @Override protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) { boolean isSafe = super.isSafeAttribute(tagName, el, attr); String signature = el.hashCode() + "-" + attr.hashCode(); if (alreadyCheckedAttributeSignatures.contains(signature) == false) { alreadyCheckedAttributeSignatures.add(signature); if (!isSafe) { say("Wrong attribute: " + attr.getKey() + " (" + attr.html() + ") in " + el.outerHtml()); } } return isSafe; } } 

SAMPLE CODE

 String html = "<p><a href='ftp://example.com/' onfocus='invalidLink()'>Link</a></p><a href='ftp://example2.com/'>Link 2</a>"; // * Custom whitelist Whitelist myReportEnabledWhitelist = new MyReportEnabledWhitelist() // ** Basic whitelist (from Jsoup) .addTags("a", "b", "blockquote", "br", "cite", "code", "dd", "dl", "dt", "em", "i", "li", "ol", "p", "pre", "q", "small", "span", "strike", "strong", "sub", "sup", "u", "ul") // .addAttributes("a", "href") // .addAttributes("blockquote", "cite") // .addAttributes("q", "cite") // .addProtocols("a", "href", "ftp", "http", "https", "mailto") // .addProtocols("blockquote", "cite", "http", "https") // .addProtocols("cite", "cite", "http", "https") // .addEnforcedAttribute("a", "rel", "nofollow") // // ** Customizations .addTags("body") // .addProtocols("a", "href", "tel", "device") // .removeProtocols("a", "href", "ftp"); boolean safeCustom = Jsoup.isValid(html, myReportEnabledWhitelist); System.out.println(safeCustom); 

OUTPUT

 Wrong attribute: href (href="ftp://example.com/") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a> Wrong attribute: onfocus (onfocus="invalidLink()") in <a href="ftp://example.com/" onfocus="invalidLink()">Link</a> Wrong attribute: href (href="ftp://example2.com/") in <a href="ftp://example2.com/">Link 2</a> false 
0
source

All Articles