Replace all characters other than words, how? * + #

Question

Replace all characters other than words, how? * + #

I need help replacing all non-word characters in a string.

As an example (stadtbezirkspräsident' should be stadtbezirkspräsident .

This regular expression should work for all languages, so it’s rather complicated because I have no idea how to match characters like ñ or œ . I tried to solve this with

 string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');

but there are still many special characters, such as Ø on the left.

Perhaps there is a common selector for this, or did someone solve this problem earlier?

+6

javascript regex match character

BeMoreDifferent.com Nov 03 '12 at 13:53

source share

3 answers

Try using the trick.

 str.replace(/(?!\w)[\x00-\xC0]/g, '')

+6

Ωmega Nov 03 '12 at 14:03

source share

This is more of a comment for Tim Pitskers, but the presentation of the code in the comments is inconvenient ... Here is a simple example of using the XRexExp package:

 <p id=orig>Bundespräsident / ß+ð/ə¿α!</p> <p id=new></p> <script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-min.js"> </script> <script src="http://xregexp.com/addons/unicode/unicode-base.js"> </script> <script> var regex = new XRegExp("\\P{L}+", "g"); var string = document.getElementById('orig').innerHTML; string = XRegExp.replace(string, regex, ""); document.getElementById('new').innerHTML = string; </script>

For production use, you probably want to download some versions of the base package and the Unicode plug-in and use them on your server.

Note. The code validates characters that are not classified as Unicode letters (alphabetic). I believe this is consistent with what you mean by the word "symbol", although words in a natural language may contain hyphens, apostrophes, and other non-letters.

Beware that characters are added to Unicode and the character category may (rarely) change. However, the package was well supported; it corresponds to Unicode 6.1 (version 6.2 is missing, but it does not have new letters).

+1

Jukka K. Korpela Nov 03 '12 at 2:43

source share

Tim pietzcker · Accepted Answer · 2012-11-03T14:04:06+0000

If you define all Unicode ranges yourself, this will be a lot of work.

It might make sense to use the Steve Levithan XRexExp XRexExp with Unicode add-ons and use its Unicode property shortcuts:

 var regex = new XRegExp("\\P{L}+", "g") string = XRegExp.replace(string, regex, "")

Replace all characters other than words, how? * + #

More articles: