How to split a string containing both a delimiter and an escaped delimiter?

My line separator ; . On the line, the delimiter line is highlighted as \; . For instance.

 irb(main):018:0> s = "a;b;;d\\;e" => "a;b;;d\\;e" irb(main):019:0> s.split(';') => ["a", "b", "", "d\\", "e"] 

Can someone suggest me a regex, so the split output will be ["a", "b", "", "d\\;e"] ? I am using Ruby 1.8.7

+4
source share
2 answers

1.8.7 does not have a negative lookbehind without Oniguruma (which can be compiled).

1.9.3; yay:

 > s = "a;b;c\\;d" => "a;b;c\\;d" > s.split /(?<!\\);/ => ["a", "b", "c\\;d"] 

1.8.7 with Oniguruma does not offer a trivial split, but you can get offsets and split the substrings this way. I guess the best way to do this is I don’t remember:

 > require 'oniguruma' > re = Oniguruma::ORegexp.new "(?<!\\\\);" > s = "hello;there\\;nope;yestho" > re.match_all s => [#<MatchData ";">, #<MatchData ";">] > mds = re.match_all s => [#<MatchData ";">, #<MatchData ";">] > mds.collect {|md| md.offset} => [[5, 6], [17, 18]] 

Other options:

  • Division into ; and subsequent processing of the search results trailing \\ or
  • Run a char -by-char loop and maintain some simple state and just split manually.
+6
source

As @ dave-newton replied, you can use a negative lookbehind, but this is not supported in 1.8. An alternative that will work in both 1.8 and 1.9 is to use String # scan instead of split, while the template does not accept (semicolon or backslash) or anychar with a gap prefix:

 $ irb >> RUBY_VERSION => "1.8.7" >> s = "a;b;c\\;d" => "a;b;c\\;d" s.scan /(?:[^;\\]|\\.)+/ => ["a", "b", "c\\;d"] 
+2
source

All Articles