Understand this expression RegEx

I am trying to understand this RegEx instruction in detail. It should check the file name from the ASP.Net FileUpload control to only allow jpeg and gif files. It was developed by someone else, and I do not quite understand it. It works fine in Internet Explorer 7.0, but not in Firefox 3.6.

<asp:RegularExpressionValidator id="FileUpLoadValidator" runat="server" ErrorMessage="Upload Jpegs and Gifs only." ValidationExpression="^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$" ControlToValidate="LogoFileUpload"> </asp:RegularExpressionValidator> 
+6
c # regex validation file-upload
source share
5 answers

This is a bad regex.

 ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$ 

Do it piecemeal.

 ([a-zA-Z]:) 

This requires that the file path starts with a disk, such as C: d: etc.

 (\\{2}\w+)\$?) 

\\{2} means that the backslash is repeated twice (note that \ must be escaped), followed by some alphanumeric characters ( \w+ ), and then possibly a dollar sign ( \$? ). This is the main part of the UNC path.

 ([a-zA-Z]:)|(\\{2}\w+)\$?) 

| means "or." Thus, it either starts with a drive letter, or from a UNC path. Congratulations to non-Windows users.

 (\\(\w[\w].*)) 

This should be part of the path directory, but actually it is 2 alphanumerics followed by everything except newlines ( .* ), For example \ ab!@ #*(#$*) .

The correct regular expression for this part should be (?:\\\w+)+

 (.jpg|.JPG|.gif|.GIF)$ 

This means that the last 3 characters of the path must be jpg , jpg , gif or gif . Please note that . is not a dot, but matches anything other than \n , so a file name like haha.abcgif or malicious.exe\0gif will pass.

The correct regular expression for this part should be \.(?:jpg|JPG|gif|GIF)$

Together

 ^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$ 

will match

 D:\foo.jpg \\remote$\dummy\..\C:\Windows\System32\Logo.gif C:\Windows\System32\cmd.exe;--gif 

and fail

 /home/user/pictures/myself.jpg C:\a.jpg C:\d\e.jpg 

The correct regular expression is /\.(?:jpg|gif)$/i and check if the downloaded file is really a server-side image.

+4
source share

Here is a short explanation:

 ^ # match the beginning of the input ( # start capture group 1 ( # start capture group 2 [a-zA-Z] # match any character from the set {'A'..'Z', 'a'..'z'} : # match the character ':' ) # end capture group 2 | # OR ( # start capture group 3 \\{2} # match the character '\' and repeat it exactly 2 times \w+ # match a word character: [a-zA-Z_0-9] and repeat it one or more times ) # end capture group 3 \$? # match the character '$' and match it once or none at all ) # end capture group 1 ( # start capture group 4 \\ # match the character '\' ( # start capture group 5 \w # match a word character: [a-zA-Z_0-9] [\w] # match any character from the set {'0'..'9', 'A'..'Z', '_', 'a'..'z'} .* # match any character except line breaks and repeat it zero or more times ) # end capture group 5 ) # end capture group 4 ( # start capture group 6 . # match any character except line breaks jpg # match the characters 'jpg' | # OR . # match any character except line breaks JPG # match the characters 'JPG' | # OR . # match any character except line breaks gif # match the characters 'gif' | # OR . # match any character except line breaks GIF # match the characters 'GIF' ) # end capture group 6 $ # match the end of the input 

EDIT

Like some of the comments requests, the above is generated by a small tool that I wrote. You can download here: http://www.big-o.nl/apps/pcreparser/pcre/PCREParser.html (WARNING: much in development!)

EDIT 2

It will correspond to such lines:

 x:\abc\def\ghi.JPG c:\foo\bar.gif \\foo$\baz.jpg 

Here groups 1, 4 and 6 correspond individually:

 group 1 | group 4 | group 6 --------+--------------+-------- | | x: | \abc\def\ghi | .JPG | | c: | \foo\bar | .gif | | \\foo$ | \baz | .jpg | | 

Note that it also matches a string like c:\foo\ bar@gif , since DOT matches any character (except line breaks). And it will reject a string like c:\foo\bar.Gif (capital G in gif ).

+9
source share

It splits the file name into mailing parts, path, file name and extension.

Most likely, IE uses a backslash, and FireFox uses a slash. Try replacing the \\ parts with [\\ /] so that the expression accepts both traits and backslashes.

+1
source share

From Expresso, this is what Expresso says:

  /// A description of the regular expression:
 ///  
 /// Beginning of line or string
 /// [1]: A numbered capture group.  [([a-zA-Z] :) | (\\ {2} \ w +) \ $?]
 /// Select from 2 alternatives
 /// [2]: A numbered capture group.  [[a-zA-Z]:]
 /// [a-zA-Z]:
 /// Any character in this class: [a-zA-Z]
 ///:
 /// (\\ {2} \ w +) \ $?
 /// [3]: A numbered capture group.  [\\ {2} \ w +]
 /// \\ {2} \ w +
 /// Literal \, exactly 2 repetitions
 /// Alphanumeric, one or more repetitions
 /// Literal $, zero or one repetitions
 /// [4]: ​​A numbered capture group.  [\\ (\ w [\ w]. *)]
 /// \\ (\ w [\ w]. *)
 /// Literal \
 /// [5]: A numbered capture group.  [\ w [\ w]. *]
 /// \ w [\ w]. *
 /// Alphanumeric
 /// Any character in this class: [\ w]
 /// Any character, any number of repetitions
 /// [6]: A numbered capture group.  [.jpg | .JPG | .gif | .GIF]
 /// Select from 4 alternatives
 /// .jpg
 /// Any character
 /// jpg
 /// .JPG
 /// Any character
 /// JPG
 /// .gif
 /// Any character
 /// gif
 /// .GIF
 /// Any character
 /// GIF
 /// End of line or string
 ///  

Hope this helps, Regards, Tom.

0
source share

You may need to perform a server side check. Check out this article.

Troubleshoot ASP.NET Validation Issues

In addition, there are some good online tools for creating or interpreting Regex expressions. but I suspect that the problem is not in expression.

0
source share

All Articles