I use Tidy Online HTML ( http://infohound.net/tidy/ ) to remove some very old and confused HTML file containing several Hebrew characters. Whenever a page is processed by Tidy, the output turns Hebrew characters into gibberish even after changing the encoding methods in the settings. Using different settings, I manage to get the same result with Hebrew characters as unicode objects. I googled around for a possible solution, but did not find any. I had a few ideas, but I donβt know exactly how to approach them, if at all (maybe someone has a better solution).
- I thought, maybe I can (after processing the page) scan the page for Unicode entities and replace them with the corresponding Hebrew characters (systematically, of course).
- Perhaps I can take the Tidy HTML source code and modify it to output Hebrew characters accordingly. The problem is that I doubt that I am knowledgeable enough to even start something like that.
source share