I'm stuck in a problem, I use a very simple RTE to get user input and clipping garbage from a string when it's published using the functions provided by RTE. I am using http://premiumsoftware.net/cleditor
After the user submits the data, I will analyze it using PHP and delete the inappropriate content. Most users are Linux / Mac users, and they usually copy content from Word emails / documents and paste them into RTE, causing a lot of junk.
We also need to allow all UTF8 characters from any language.
Saying all this, check out this image

, char , MYSQL , . HEX, char. .
. , PDF script