.NET Regex for white characters

Question

.NET Regex for white characters

Consider an algorithm that should determine if a string character contains any characters outside of white characters.

The whitelist is as follows:

-. AbcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ÇüéâäàäçêêèèÄÄÅæÆæÆööööÜÜÜðððõøýþÿ
Note: spaces and apostrophes are required to be included in this whitelist.

This will usually be a static method, but it will be converted to an extension method.

 private bool ContainsAllWhitelistedCharacters(string input) { string regExPattern="";// the whitelist return Regex.IsMatch(input, regExPattern); }

Questions:

Thanks for the performance comments to all responders. Performance is not a problem. Quality, readability and maintainability! Less code = less chance of defects, IMO.

Question:

What should be the whitelist regular expression pattern?

+4

c # regex .net

p.campbell Aug 20 '09 at 16:17

source share

4 answers

Why should it be a regular expression?

 private bool ContainsAllWhitelistedCharacters(string input) { string whitelist = "abcdefg..."; foreach (char c in input) { if (whitelist.IndexOf(c) == -1) return false; } return true; }

You don’t have to go straight into regular expressions if you don’t know how to implement the one you need and you haven’t profiled this section of the code and found out that you need extra performance.

+5

Mark rushakoff Aug 20 '09 at 16:22

source share

I don’t know how regex backs are implemented, but using the following for anything other than your list might be most effective:

 private bool ContainsAllWhitelistedCharacters(string input) { Regex r = new Regex("[^ your list of chars ]"); return !r.IsMatch(test) }

0

Mark synowiec Aug 20 '09 at 16:30

source share

Note that I do not recommend this if performance is not an issue, but I thought I wanted to point out that even if you precompile the regular expression, you can do it pretty quickly:

compare:

 static readonly Regex r = new Regex( @"^(['\-\.a-zA-Z ÇüéâäàåçêëèïîíìÄÅÉæÆôöòûùÖÜáíóúñÑ"+ "ÀÁÂÃÈÊËÌÍÎÏÐÒÓÔÕØÙÚÛÝßãðõøýþÿ]+)$"); public bool IsValidCustom(string value) { return r.IsMatch(value); }

with:

 private bool ContainsAllWhitelistedCharacters(string input) { foreach (var c in input) { switch (c) { case '\u0020': continue; case '\u0027': continue; case '\u002D': continue; case '\u002E': continue; case '\u0041': continue; case '\u0042': continue; case '\u0043': continue; case '\u0044': continue; case '\u0045': continue; case '\u0046': continue; case '\u0047': continue; case '\u0048': continue; case '\u0049': continue; case '\u004A': continue; case '\u004B': continue; case '\u004C': continue; case '\u004D': continue; case '\u004E': continue; case '\u004F': continue; case '\u0050': continue; case '\u0051': continue; case '\u0052': continue; case '\u0053': continue; case '\u0054': continue; case '\u0055': continue; case '\u0056': continue; case '\u0057': continue; case '\u0058': continue; case '\u0059': continue; case '\u005A': continue; case '\u0061': continue; case '\u0062': continue; case '\u0063': continue; case '\u0064': continue; case '\u0065': continue; case '\u0066': continue; case '\u0067': continue; case '\u0068': continue; case '\u0069': continue; case '\u006A': continue; case '\u006B': continue; case '\u006C': continue; case '\u006D': continue; case '\u006E': continue; case '\u006F': continue; case '\u0070': continue; case '\u0071': continue; case '\u0072': continue; case '\u0073': continue; case '\u0074': continue; case '\u0075': continue; case '\u0076': continue; case '\u0077': continue; case '\u0078': continue; case '\u0079': continue; case '\u007A': continue; case '\u00C0': continue; case '\u00C1': continue; case '\u00C2': continue; case '\u00C3': continue; case '\u00C4': continue; case '\u00C5': continue; case '\u00C6': continue; case '\u00C7': continue; case '\u00C8': continue; case '\u00C9': continue; case '\u00CA': continue; case '\u00CB': continue; case '\u00CC': continue; case '\u00CD': continue; case '\u00CE': continue; case '\u00CF': continue; case '\u00D0': continue; case '\u00D1': continue; case '\u00D2': continue; case '\u00D3': continue; case '\u00D4': continue; case '\u00D5': continue; case '\u00D6': continue; case '\u00D8': continue; case '\u00D9': continue; case '\u00DA': continue; case '\u00DB': continue; case '\u00DC': continue; case '\u00DD': continue; case '\u00DF': continue; case '\u00E0': continue; case '\u00E1': continue; case '\u00E2': continue; case '\u00E3': continue; case '\u00E4': continue; case '\u00E5': continue; case '\u00E6': continue; case '\u00E7': continue; case '\u00E8': continue; case '\u00E9': continue; case '\u00EA': continue; case '\u00EB': continue; case '\u00EC': continue; case '\u00ED': continue; case '\u00EE': continue; case '\u00EF': continue; case '\u00F0': continue; case '\u00F1': continue; case '\u00F2': continue; case '\u00F3': continue; case '\u00F4': continue; case '\u00F5': continue; case '\u00F6': continue; case '\u00F8': continue; case '\u00F9': continue; case '\u00FA': continue; case '\u00FB': continue; case '\u00FC': continue; case '\u00FD': continue; case '\u00FE': continue; case '\u00FF': continue; } return false; } return true; // empty string is true }

In very fast testing on the corpus of words with a bandwidth of about 60%, I get about such a coefficient to speed up this approach.

This is actually no less readable than a regular expression without escape characters!

0

ShuggyCoUk Aug 20 '09 at 17:57

source share

Kelsey · Accepted Answer · 2009-08-20T16:18:54+0000

You can map the image using the following:

 ^([\-\.a-zA-Z ÇüéâäàåçêëèïîíìÄÅÉæÆôöòûùÖÜáíóúñÑÀÁÂÃÈÊËÌÍÎÏÐÒÓÔÕØÙÚÛÝßãðõøýþÿ]+)$

Make it an extension method with

 public static bool IsValidCustom(this string value) { string regExPattern="^([\-\.a-zA-Z ÇüéâäàåçêëèïîíìÄÅÉæÆôöòûùÖÜáíóúñÑÀÁÂÃÈÊËÌÍÎÏÐÒÓÔÕØÙÚÛÝßãðõøýþÿ]+)$"; return Regex.IsMatch(input, regExPattern); }

I cannot think of a simple way to make a supported range with extended characters, since the order of the characters is not obvious.

.NET Regex for white characters

More articles: