Regex to replace invalid characters

I don't have much experience with RegEx, so I use a lot of chained String.Replace () calls to remove unwanted characters - is there a RegEx I can write to arrange this?

string messyText = GetText(); string cleanText = messyText.Trim() .ToUpper() .Replace(",", "") .Replace(":", "") .Replace(".", "") .Replace(";", "") .Replace("/", "") .Replace("\\", "") .Replace("\n", "") .Replace("\t", "") .Replace("\r", "") .Replace(Environment.NewLine, "") .Replace(" ", ""); 

thanks

+6
c # regex
source share
3 answers

Try this regex:

 Regex regex = new Regex(@"[\s,:.;/\\]+"); string cleanText = regex.Replace(messyText, "").ToUpper(); 

\s is the character class equivalent to [ \t\r\n] .


If you just want to save alphanumeric characters, instead of adding each non-alphanumeric character to a character class, you can do this:

 Regex regex = new Regex(@"[\W_]+"); string cleanText = regex.Replace(messyText, "").ToUpper(); 

Where \W is any character without a word (not [^a-zA-Z0-9_] ).

+13
source share

Character classes to the rescue!

 string messyText = GetText(); string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "") 
+2
source share

You might want to use the β€œwhite list”, there is an ocean of funny characters whose actions, depending on the combination, may not be easy to understand.

A simple regex that removes everything but allowed characters may look like this:

 messyText = Regex.Replace(messyText, @"[^a-zA-Z0-9\x7C\x2C\x2E_]", ""); 

The ^ exists to invert selection, except for alphanumeric characters, which this regular expression allows | ,. and _ You can add and remove characters and character sets as needed.

0
source share

All Articles