Use Apache StringEscapeUtils.escapeHtml(String) or StringEscapeUtils.unescapeHtml(String) . This can be found in shared libraries.
If you need to save any HTML markup, but just remove any ascii encoding, you will have to build a map of the values you want to avoid. This is an exercise in the manner of String , so it can be considered an "ugly hack", but it will work quickly.
For example, with some pseudo-code, Create Map<String, String>() and fill it with the value that you want to replace as a key, and the value that you want to replace in the value. To find the HTML ascii code in a document using a regular expression, see the ascii code in the Map substitution Replace the appearance of the ascii HTML code with the text equivalent.
I will send the code over the weekend if I have a chance.
source share