Replace all characters other than words, how? * + #

I need help replacing all non-word characters in a string.

As an example (stadtbezirkspräsident' should be stadtbezirkspräsident .

This regular expression should work for all languages, so it’s rather complicated because I have no idea how to match characters like ñ or œ . I tried to solve this with

 string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' '); 

but there are still many special characters, such as Ø on the left.

Perhaps there is a common selector for this, or did someone solve this problem earlier?

+6
source share
3 answers

If you define all Unicode ranges yourself, this will be a lot of work.

It might make sense to use the Steve Levithan XRexExp XRexExp with Unicode add-ons and use its Unicode property shortcuts:

 var regex = new XRegExp("\\P{L}+", "g") string = XRegExp.replace(string, regex, "") 
+6
source

Try using the trick.

 str.replace(/(?!\w)[\x00-\xC0]/g, '') 
+6
source

This is more of a comment for Tim Pitskers, but the presentation of the code in the comments is inconvenient ... Here is a simple example of using the XRexExp package:

 <p id=orig>Bundespräsident / ß+ð/ə¿α!</p> <p id=new></p> <script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-min.js"> </script> <script src="http://xregexp.com/addons/unicode/unicode-base.js"> </script> <script> var regex = new XRegExp("\\P{L}+", "g"); var string = document.getElementById('orig').innerHTML; string = XRegExp.replace(string, regex, ""); document.getElementById('new').innerHTML = string; </script> 

For production use, you probably want to download some versions of the base package and the Unicode plug-in and use them on your server.

Note. The code validates characters that are not classified as Unicode letters (alphabetic). I believe this is consistent with what you mean by the word "symbol", although words in a natural language may contain hyphens, apostrophes, and other non-letters.

Beware that characters are added to Unicode and the character category may (rarely) change. However, the package was well supported; it corresponds to Unicode 6.1 (version 6.2 is missing, but it does not have new letters).

+1
source

All Articles