I have a problem that I want to filter certain texts that may contain html. I use jsoup to whitelist and clean tags that work pretty well.
My only problem is that some of the tags may contain attributes, mostly style or classes, but there may also be different attributes. (name, purpose, etc.). When cleaning, this is not a problem, because they are easily erased, but when using the white list, some tags that will be allowed are blocked due to attributes. The base whitelist does not seem to cover the attributes of a style or class, and I cannot escape what I encounter.
Since I want to allow a fairly wide range of tags, but delete most of them during cleanup, I donβt want to add all the attributes for all the tags that I allow. The simplest would be to remove all the attributes from all the tags, since I am not interested in them at all, and then check if the split text is really simple tags.
Is there a function that removes all attributes or some simple loop, another option is to tell whitelister to ignore all the attributes and just whitelist the tags.
java jsoup
Xtroce
source share