Odd regex behavior - only matches the first and last capture group

I am trying to write a regular expression that will match a comma separated list of words and capture all the words. This string should match apple apple , banana ,orange,peanut , and the captures should be apple , banana , orange , peanut . For this, I use the following regexp:

 ^\s*([a-z_]\w*)(?:\s*,\s*([a-z_]\w*))*\s*$ 

It successfully matches the string, but only apple and peanut suddenly captured. This behavior is observed in both C # and Perl. So, I guess I am missing something about how regular expression works. Any ideas?:)

+6
source share
4 answers

The value specified by match.Groups[2].Value is only the last value match.Groups[2].Value by the second group.

To find all the values, look at match.Groups[2].Captures[i].Value , where in this case i varies from 0 to 2 . (And also match.Groups[1].Value for the first group.)

(+ 1 for the question, I learned something today!)

+3
source

Try the following:

 string text = " apple , banana ,orange,peanut"; var matches = Regex.Matches(text, @"\s*(?<word>\w+)\s*,?") .Cast<Match>() .Select(x => x.Groups["word"].Value) .ToList(); 
+3
source

You repeat your capture group , with each repeated match, the previous content is overwritten. Thus, only the last match of your second capture group is available at the end.

You can change the second capture group to

 ^\s*([a-z_]\w*)((?:\s*,\s*(?:[a-z_]\w*))*)\s*$ 

Then the result will be, "banana, orange, peanuts" in your second group. I'm not sure if you want this.

If you want to check that the string has this pattern and extracts every word. I would do it in two steps.

  • Check the pattern with your regular expression.

  • If the pattern is correct, remove the leading and trailing spaces and divide by \s*,\s* .

+2
source

Simple regex:

(?:^| *)(.+?)(?:,|$)

Explanation:

 ?: # Non capturing group ^| * # Match start of line or multiple spaces .+ # Capture the word in the list, lazy ?: # Non capture group ,|$ # Match comma or end of line 

Note: Rublular is a good site for testing this kind of thing.

+2
source

All Articles