Parsing text between .NET regular expression quotes

I have the following input text:

@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38" 

I would like to parse the values ​​using the syntax @name = value as a name / value pair. Parsing the previous line should result in the following named captures:

 name:"foo" value:"bar" name:"name" value:"John \""The Anonymous One\"" Doe" name:"age" value:"38" 

I tried the following regex that almost bothered me:

 @"(?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))" 

The main problem is that it captures the initial quote in "John \""The Anonymous One\"" Doe" . I feel it should be lookbehind instead of lookahead, but this does not seem to work at all.

Here are some rules to express:

  • The name must begin with a letter and contain any letter, number, underscore or hyphen.

  • Without quotes, there must be at least one character and can contain any letter, number, underscore or hyphen.

  • The quoted value can contain any character, including any space and escaped quotation marks.

Edit:

Here is the result of regex101.com :

 (?:(?<=\s)|^)@(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) (?:(?<=\s)|^) Non-capturing group @ matches the character @ literally (?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name \s* match any white space character [\r\n\t\f ] = matches the character = literally \s* match any white space character [\r\n\t\f ] Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy] (?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value 1st Alternative: [A-Za-z0-9_-]+ [A-Za-z0-9_-]+ match a single character present in the list below Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy] AZ a single character in the range between A and Z (case sensitive) az a single character in the range between a and z (case sensitive) 0-9 a single character in the range between 0 and 9 _- a single character in the list _- literally 2nd Alternative: (?=").+?(?=(?<!\\)") (?=") Positive Lookahead - Assert that the regex below can be matched " matches the characters " literally .+? matches any character (except newline) Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy] (?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched (?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below \\ matches the character \ literally " matches the characters " literally 
+5
source share
2 answers

You can use the very useful .NET regular expression function, which allows the use of several identical names. In addition, there is a problem with your capture group (?<name>) : it allows a digit in the first position that does not meet your first requirement.

So, I suggest:

 (?si)(?:(?<=\s)|^)@(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)"")) 

Watch the demo

Please note that you cannot debug .NET regular expressions for regex101.com, you need to test them in a .NET compatible environment.

+1
source

Use string methods.

Split

 string myLongString = ""@"This is some text @foo=bar @name=""John \""The Anonymous One\"" Doe"" @age=38" string[] nameValues = myLongString.Split('@'); 

From there, either use the Split function with "=", or use IndexOf ("=") .

0
source

All Articles