Best way to remove characters that are not ASCII 32 to 175 C #

I need to remove characters from a string not in the Ascii range from 32 to 175, all that needs to be removed.

I don't know very well if RegExp might be the best solution instead of using something like .replace () or .remove (), breaking every invalid character or something else.

Any help would be appreciated.

+7
source share
5 answers

you can use

Regex.Replace(myString, @"[^\x20-\xaf]+", ""); 

Here the regular expression consists of a character class ( [...] ), consisting of all not characters ( ^ at the beginning of the class) in the range U + 0020 to U + 00AF (32-175, expressed in hexadecimal notation). Regarding regular expressions, this is pretty simple, but it may puzzle someone not familiar with it.

But you can go one more route:

 new string(myString.Where(c => (c >= 32) && (c <= 175)).ToArray()); 

This probably depends mainly on what you prefer to read. Without much experience with regular expressions, I would say that the second will be clearer.

Multiple performance measurements, 10,000 rounds each, in seconds:

 2000 characters, the first 143 of which are between 32 and 175 Regex without + 4.1171 Regex with + 0.4091 LINQ, where, new string 0.2176 LINQ, where, string.Join 0.2448 StringBuilder (xanatos) 0.0355 LINQ, horrible (HatSoft) 0.4917 2000 characters, all of which are between 32 and 175 Regex without + 0.4076 Regex with + 0.4099 LINQ, where, new string 0.3419 LINQ, where, string.Join 0.7412 StringBuilder (xanatos) 0.0740 LINQ, horrible (HatSoft) 0.4801 

So my approaches are the slowest :-). You should probably go with xanatos answer and wrap it in a method with a nice, clear name. For inline use or fast and dirty things, or where performance doesn't matter, I would probably use a regex.

+16
source

Because I think that if you do not know how to write Regex, you should not use it, especially for something so simple:

 var sb = new StringBuilder(); foreach (var c in str) { if (c >= 32 && c <= 175) { sb.Append(c); } } var str2 = str.ToString(); 
+7
source

Use regex [^\x20-\xAF]+ and replace it with the empty string ""

 Regex.Replace(str, @"[^\x20-\xAF]+", ""); 
+3
source

How about using linq this way

 string text = (from c in "AAA hello aaaa #### Y world" let i = (int) c where i < 32 && i > 175 select c) .Aggregate("", (current, c) => current + c); 
+1
source
 static unsafe string TrimRange(string str, char from, char to) { int count = 0; for (int i = 0; i < str.Length; i++) { char ch = str[i]; if ((ch >= from) && (ch <= to)) { count++; } } if (count == 0) return String.Empty; if (count == str.Length) return str; char * result = stackalloc char[count]; count = 0; for (int i = 0; i < str.Length; i++) { char ch = str[i]; if ((ch >= from) && (ch <= to)) { result[count ++] = ch; } } return new String(result, 0, count); } 
+1
source

All Articles