OK, I decided it somehow.
The HTMLCleaner library is used to parse input data in a valid format.
Then I use the DOM parser to iterate over everything and share all the forbidden tags and attributes.
(and some minor ugly hacks;))
It was a lot of work.
Vladimir
source
share