How to parse a string to find key-value pairs in it

When searching for mail on Google, we use sytax, for example

from:devcoder hasattachments:true mySearchString on:11-aug 

or

 mySearchString from:devcoder on:11-aug anotherSearchKeyword 

After parsing, I should get a couple of key values, for example (from, devcoder), (on, 11-aug). What is the best way to implement this parsing in C #.

+4
source share
4 answers

In response to Linq-ify Jason:

 string s = "from:devcoder hasattachments:true mySearchString on:11-aug"; var keyValuePairs = s.Split(' ') .Select(x => x.Split(':')) .Where(x => x.Length == 2) .ToDictionary(x => x.First(), x => x.Last()); 
+18
source

Separate by space, then for each component of the split, divide it by : Then act accordingly. Roughly speaking:

 string s = "from:devcoder hasattachments:true mySearchString on:11-aug"; var components = s.Split(' '); var blocks = components.Select(component => component.Split(':')); foreach(var block in blocks) { if(block.Length == 1) { Console.WriteLine("Found {0}", block[0]); } else { Console.WriteLine( "Found key-value pair key = {0}, value = {1}", block[0], block[1] ); } } 

Output:

 Found key-value pair key = from, value = devcoder Found key-value pair key = hasattachments, value = true Found mySearchString Found key-value pair key = on, value = 11-aug 

The output from the second line:

 Found mySearchString Found key-value pair key = from, value = devcoder Found key-value pair key = on, value = 11-aug Found anotherSearchKeyword 
+5
source

Here is one regular expression-based approach I've used in the past; it supports prefixes in combination with quoted strings.

In a more correct / reliable / efficient approach, a simple parser will be written, however, in most use cases, the time and effort involved in implementing and testing the parser will be significantly disproportionate to the gain.

 private static readonly Regex searchTermRegex = new Regex( @"^( \s* (?<term> ((?<prefix>[a-zA-Z][a-zA-Z0-9-_]*):)? (?<termString> (?<quotedTerm> (?<quote>['""]) ((\\\k<quote>)|((?!\k<quote>).))* \k<quote>? ) |(?<simpleTerm>[^\s]+) ) ) \s* )*$", RegexOptions.Compiled | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture ); private static void FindTerms(string s) { Console.WriteLine("[" + s + "]"); Match match = searchTermRegex.Match(s); foreach(Capture term in match.Groups["term"].Captures) { Console.WriteLine("term: " + term.Value); Capture prefix = null; foreach(Capture prefixMatch in match.Groups["prefix"].Captures) if(prefixMatch.Index >= term.Index && prefixMatch.Index <= term.Index + term.Length) { prefix = prefixMatch; break; } if(null != prefix) Console.WriteLine("prefix: " + prefix.Value); Capture termString = null; foreach(Capture termStringMatch in match.Groups["termString"].Captures) if(termStringMatch.Index >= term.Index && termStringMatch.Index <= term.Index + term.Length) { termString = termStringMatch; break; } Console.WriteLine("termString: " + termString.Value); } Console.WriteLine(); } public static void Main (string[] args) { FindTerms(@"two terms"); FindTerms(@"prefix:value"); FindTerms(@"some:""quoted term"""); FindTerms(@"firstname:Jack ""the Ripper"""); FindTerms(@"'quoted term\ escaped quotes'"); FindTerms(@"""unterminated quoted string"); } 

Output:

 [two terms] term: two termString: two term: terms termString: terms [prefix:value] term: prefix:value prefix: prefix termString: value [some:"quoted term"] term: some:"quoted term" prefix: some termString: "quoted term" [firstname:Jack "the Ripper"] term: firstname:Jack prefix: firstname termString: Jack term: "the Ripper" termString: "the Ripper" ['quoted term\ escaped quotes'] term: 'quoted term\ escaped quotes' termString: 'quoted term\ escaped quotes' ["unterminated quoted string] term: "unterminated quoted string termString: "unterminated quoted string 
+5
source

First, Split() in space, then you have an array containing all the search terms. Then you loop over them to find those that Contains() colon (:) and Split() they are again in the colon.

+1
source

All Articles