Regular expression for extracting strings outside single or double quotes

I am currently creating a webpage using asp.net and C #. I am having trouble parsing a string provided by the user. For example, the user provided the line below, and I need to extract words that are outside of single or double quotes. Can someone help me on this? Thanks in advance for your help.

"we run" live "experiments" inside and outside 'a lab' 

Expected result using regex:

 live inside and outside 
+4
source share
2 answers

It will do it. All matches to the "unquote" group correspond to what you want:

 (?<unquote>[^"'\s]+)|(?:["][^"]+?["])|(?:['][^']+?[']) 

C # verification code:

  var matches = Regex.Matches( @"""we run"" live ""experiments"" inside and outside 'a lab'", @"(?<unquote>[^""'\s]+)|(?:[""][^""]+?[""])|(?:['][^']+?['])" ); foreach( Match match in matches ) { if( match.Groups["unquote"].Success ) { Console.WriteLine( match.Groups["unquote"].Value.Trim() ); } } 

Exit:

live

inside

and

out of

Where:

  • <unquote> means put in a group called unquote
  • ^"'\s means matching everything that is not a double single quote or space.
  • (?:["][^"]+?["]) means matching everything inside the quote with the following quote. Pay attention to +? so that it is not greedy and ?: so that the group is not captured. The same is for a single quote.

This will work with blank lines and strings where single quotes are enclosed in double quotes. Do you want to ignore apostrophes? If so, then you need to expand the regex a bit to allow “not to precede the space:

 (?<unquote>(?>[^"\s](?<!\s[']))+)|(?:["][^"]+?["])|(?:['][^']+?[']) 

Good luck with your live experiments.

+1
source
 var parts = Regex.Split(input, @"[""'].+?[""']") .SelectMany(x => x.Split()) .Where(s => !String.IsNullOrWhiteSpace(s)) .ToList(); 

or

 var parts = Regex.Split(input, @"[""'].+?[""']") .SelectMany(x => x.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries)) .ToList(); 
+1
source

All Articles