How to specify a wildcard character (for ANY character) in a C # regex expression?

Trying to use a wildcard in C # to capture information from a web page source, but I cannot figure out what to use as a wildcard. Nothing I tried works!

The wildcard should only contain numbers, but since the page is generated the same every time, I can also accept any characters.

Regex operator:

Regex guestbookWidgetIDregex = new Regex("GuestbookWidget(' INSERT WILDCARD HERE ', '(.*?)', 500);", RegexOptions.IgnoreCase); 

If someone can understand what I'm doing wrong, it would be very helpful!

+4
source share
2 answers

Wildcard .
To match any number of arbitrary characters, use .* (Which means zero or more . ) Or .+ (Which means one or more . )

Note that you need to avoid your parentheses like \\( and \\) . (or \( and \) in the line @"" )

+10
source

At the point

In regular expression, period . matches almost any character. The only characters that usually do not match are newlines. In order for the dot to coincide with all characters, you must include what is called single-line mode (aka "dot all").

In C #, this is indicated using RegexOptions.Singleline . You can also insert this as (?s) in the template.

References


By metacharacters and escaping

. is not the only regular expression metacharacter. It:

 ( ) { } [ ] ? * + - ^ $ . | \ 

Depending on where they appear, if you want these characters to mean literally (for example,. As a period), you may need to do so-called β€œescaping”. This is done preceded by \ .

Of course, a \ also an escape character for C # string literals. To get the literal \ , you need to double it in a string literal (that is, "\\" is a string of length one). Alternatively, C # also has what are called @ -conciled string literals, where escape sequences are not processed. So the following two lines are equal:

 "c:\\Docs\\Source\\a.txt" @"c:\Docs\Source\a.txt" 

Since \ often used in regular expression, @ -quoting is often used to avoid excessive doubling.

References


In character classes

Regular expression mechanisms allow you to define character classes, for example. [aeiou] is a character class containing 5 vowels. You can also use a metacharacter to specify a range, for example. [0-9] are character classes containing all ten-digit characters.

Since numeric characters are used so often, the regular expression also provides an abbreviation for it, which is \d . In C #, this will also correspond to decimal places from other Unicode character sets unless you use RegexOptions.ECMAScript , where it is strictly simple [0-9] .

References

Related Questions


Putting it all together

It looks like the following will work for you:

  @-quoting digits_ _____anything but ', captured | / \ / \ new Regex(@"GuestbookWidget\('\d*', '([^']*)', 500\);", RegexOptions.IgnoreCase); \/ \/ escape ( escape ) 

Please note that I changed the pattern a bit to use a negative character class instead of negatively matching the pattern. This leads to a slight difference in behavior if you allow ' escape the input line, but no template handles this case. However, if you don't allow ' escaping, this pattern is definitely better.

References

+7
source

Source: https://habr.com/ru/post/1312713/


All Articles