Regex for splitting text containing tokens

Question

Regex for splitting text containing tokens

If I have a line such as "abcdef {123} ghi {456} kl", I want to create a regular expression that will give me all the parts, separated as follows:

abcdef {123} ghi {456} kl

I am using this code but cannot figure out what expression should be:

 System.Text.RegularExpressions.Regex rex = new System.Text.RegularExpressions.Regex("expression"); foreach (System.Text.RegularExpressions.Match match in rex.Matches(sText).OfType<System.Text.RegularExpressions.Match>()) { ... }

+4

c # regex .net

Jeremy Jun 30 '10 at 20:31

source share

2 answers

Mark byers · Answer 1 · 2010-06-30T20:35:28+0000

You should probably use using statements instead of writing a namespace each time. At first glance, your code looks rather complicated, but when deleting all namespaces it turns out to be very simple. OfType also not required.

The regular expression must match the maximum possible, which is not an open bracket [^{]* , or an open bracket, some text, and then a closing bracket {[^}]*} . The regular expression for this is:

 {[^}]*}|[^{]*

Try this code:

 string text = "abcdef{123}ghi{456}kl"; Regex regex = new Regex("{[^}]*}|[^{]*"); foreach (Match match in regex.Matches(text)) { Console.WriteLine(match.Value); }

Conclusion:

  abcdef
 {123}
 ghi
 {456}
 kl

Note: this regular expression does not confirm that the string is in the correct format, it assumes that it is well formed.

A slightly simpler way is to use Split instead of Matches and include the capture group in the regular expression so that the delimiter is also included in the output:

 string text = "abcdef{123}ghi{456}kl"; Regex regex = new Regex("({[^}]*})"); foreach (string part in regex.Split(text)) { Console.WriteLine(part); }

The output for this is the same as above.

Richard Fearn · Answer 2 · 2010-06-30T20:35:41+0000

 ([az]+)({\d+})([az]+)({\d+})([az]+)

will work, but only if there are always five parts in the line. Can there be less / more than five?

Regex for splitting text containing tokens

More articles: