This is more of a comment for Tim Pitskers, but the presentation of the code in the comments is inconvenient ... Here is a simple example of using the XRexExp package:
<p id=orig>Bundespräsident / ß+ð/ə¿α!</p> <p id=new></p> <script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-min.js"> </script> <script src="http://xregexp.com/addons/unicode/unicode-base.js"> </script> <script> var regex = new XRegExp("\\P{L}+", "g"); var string = document.getElementById('orig').innerHTML; string = XRegExp.replace(string, regex, ""); document.getElementById('new').innerHTML = string; </script>
For production use, you probably want to download some versions of the base package and the Unicode plug-in and use them on your server.
Note. The code validates characters that are not classified as Unicode letters (alphabetic). I believe this is consistent with what you mean by the word "symbol", although words in a natural language may contain hyphens, apostrophes, and other non-letters.
Beware that characters are added to Unicode and the character category may (rarely) change. However, the package was well supported; it corresponds to Unicode 6.1 (version 6.2 is missing, but it does not have new letters).
source share