Regex for capturing colon-separated key-value pairs with multi-line values

I am currently working on a project in Ruby on Rails (in Eclipse), and my task is to break the data block into the appropriate parts using regular expressions.

I decided to split the data based on 3 parameters:

  • The string must begin with a capital letter (equivalent to RegEx - /^[AZ]/ )
  • It should end in: (equivalent to RegEx - /$":"/ )

I would appreciate any help .... The code I use in my controller is:

 @f = File.open("report.rtf") @fread = @f.read @chunk = @fread.split(/\n/) 

where @chunk is the array to be created by the partition, and @fread is the data that is split (in new lines).

Any help would be appreciated, thanks a lot!

I can’t release accurate data, but it mostly happens (this is related to medicine)

Exam 1: CBW 8080

RESULT:

This report is dictated by specific measurements. Please see the Initial Report.

COMPARE: 1/30/2012, 3/8/12, 4/9/12

RECIST 1.1: BLAH BLAH BLAH

An ideal output would be an array that says:

 ["Exam 1:", "CBW 8080", "RESULT", "This report is dictated with specific measurement. Please see the original report.", "COMPARISON:", "1/30/2012, 3/8/12, 4/9/12", "RECIST 1.1:", "BLAH BLAH BLAH"] 

PS I just use \ n as a placeholder until I get it working

+4
source share
5 answers

Given the clarified question, here is a new solution.

UPDATED

"Slurp" the entire data block (including newlines and all) in the first line.

 str = IO.read("report.rtf") 

Then use this regex:

 captures = str.scan(/(?<=^|[\r\n])([AZ][^:]*):([^\r\n]*(?:[\r\n]+(?![AZ].*:).*)*)/) 

See a live example here: http://rubular.com/r/8w3X6WGq4l .

The answer explained:

  (?<= Lookbehind assertion. ^ Start at the beginning of the string, | or, [\r\n] a new line. ) ( Capture group 1, the "key". [AZ][^:]* Capital letter followed as many non-colon characters as possible. ) : The colon character. ( Capture group 2, the "value". [^\r\n]* All characters (ie non-newline characters) on the same line belongs to the "value," so take them all. (?: Non-capture group. [\r\n]+ Having already taken everything up to a newline character, take the newline character(s) now. (?! Negative lookahead assertion. [^AZ].*: If this next line contains a capital letter, followed by a string of anything then a colon, then it is a new key/value pair, so we do not want to match this case. ) .* Providing this isn't the case though, take the line! )* And keep taking lines as long as we don't find a key/value pair. ) 
+3
source

I'm not quite sure what you are looking for. If you want all capitalization entries to be accompanied by text and a semicolon, you can do:

 str.scan(/[AZ].*?:/) 
0
source

That should do it.

 /^[AZ].*:$/ 
0
source

The regular expression could be: /(^[AZ].*\:)/m And you extract by adding:

 @chunk = @fread.scan(/(^[AZ].*\:)/m) 

provided that @fread is a string. You can use http://rubular.com/ to test the regular expression in ruby.

0
source

Another solution:

 input_str.split("\r\n").each |s| do var_name = s.split(": ")[0] var_value = s.split(": ")[1] # do whatever you like done 
0
source

All Articles