Removing Jsoup attribute on html tags

I have a problem that I want to filter certain texts that may contain html. I use jsoup to whitelist and clean tags that work pretty well.

My only problem is that some of the tags may contain attributes, mostly style or classes, but there may also be different attributes. (name, purpose, etc.). When cleaning, this is not a problem, because they are easily erased, but when using the white list, some tags that will be allowed are blocked due to attributes. The base whitelist does not seem to cover the attributes of a style or class, and I cannot escape what I encounter.

Since I want to allow a fairly wide range of tags, but delete most of them during cleanup, I don’t want to add all the attributes for all the tags that I allow. The simplest would be to remove all the attributes from all the tags, since I am not interested in them at all, and then check if the split text is really simple tags.

Is there a function that removes all attributes or some simple loop, another option is to tell whitelister to ignore all the attributes and just whitelist the tags.

+7
java jsoup
source share
1 answer

The solution that finally worked for me is pretty simple. I iterate over all the elements, then iterate over all the attributes and then delete them in the element, which leaves me with a cleaned version, where I just need to check the html tags themselves. I think this is not the best way to solve the problem, but it does what I wanted.

** EDIT **

I voted many times for the old code, while in fact it contained the absolute error of the newbies. You can never delete by looking at the same list. However, this error only occurs when multiple attributes are deleted.

updated error correction code:

Document doc = Jsoup.parseBodyFragment(aText); Elements el = doc.getAllElements(); List<String> attToRemove = new ArrayList<>(); for (Element e : el) { Attributes at = e.attributes(); for (Attribute a : at) { attToRemove.add(a.getKey()); } for(String att : attToRemove) { e.removeAttr(att); } } return Jsoup.isValid(doc.body().html(), theLegalWhitelist); 
+16
source share

All Articles