Ruby single line liner for matching regular expressions

In Perl, I use the following string operators to pull matches from a string through regular expressions and assign them. This finds one match and assigns it to the string:

my $string = "the quick brown fox jumps over the lazy dog."; my $extractString = ($string =~ m{fox (.*?) dog})[0]; 

Result: $extractString == 'jumps over the lazy'

And this array creates an array of several matches:

 my $string = "the quick brown fox jumps over the lazy dog."; my @extractArray = $string =~ m{the (.*?) fox .*?the (.*?) dog}; 

Result: @extractArray == ['quick brown', 'lazy']

Is there an equivalent way to create these single-line in Ruby?

+6
source share
3 answers
 string = "the quick brown fox jumps over the lazy dog." extract_string = string[/fox (.*?) dog/, 1] # => "jumps over the lazy" extract_array = string.scan(/the (.*?) fox .*?the (.*?) dog/).first # => ["quick brown", "lazy"] 

This approach will also return nil (instead of throwing an error) if no match is found.

 extract_string = string[/MISSING_CAT (.*?) dog/, 1] # => nil extract_array = string.scan(/the (.*?) MISSING_CAT .*?the (.*?) dog/).first # => nil 
+5
source

Use String#match and MatchData#[] or MatchData#captures to get consistent backlinks.

 s = "the quick brown fox jumps over the lazy dog." s.match(/fox (.*?) dog/)[1] # => "jumps over the lazy" s.match(/fox (.*?) dog/).captures # => ["jumps over the lazy"] s.match(/the (.*?) fox .*?the (.*?) dog/)[1..2] # => ["quick brown", "lazy"] s.match(/the (.*?) fox .*?the (.*?) dog/).captures # => ["quick brown", "lazy"] 

UPDATE

To avoid the undefined method [] error:

 (s.match(/fox (.*?) cat/) || [])[1] # => nil (s.match(/the (.*?) fox .*?the (.*?) cat/) || [])[1..2] # => nil (s.match(/the (.*?) fox .*?the (.*?) cat/) || [])[1..-1] # instead of .captures # => nil 
+7
source

First, be careful in terms of Perl when writing in Ruby. We are doing something in more detail to make the code more readable.

I would write my @extractArray = $string =~ m{the (.*?) fox .*?the (.*?) dog}; as:

 string = "the quick brown fox jumps over the lazy dog." string[/the (.*?) fox .*?the (.*?) dog/] extract_array = $1, $2 # => ["quick brown", "lazy"] 

Ruby, like Perl, knows capture groups and assigns them to the values $1 , $2 , etc. These make it very clean and understandable when capturing values ​​and assigning them later. The regex mechanism allows you to create and assign named captures, but they usually hide what happens, so for clarity, I usually go this way.

We can use match to get there:

 /the (.*?) fox .*?the (.*?) dog/.match(string) # => #<MatchData "the quick brown fox jumps over the lazy dog" 1:"quick brown" 2:"lazy"> 

but is the end result more readable?

 extract_array = /the (.*?) fox .*?the (.*?) dog/.match(string)[1..-1] # => ["quick brown", "lazy"] 

These captures are also interesting:

 /the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/ =~ string quick_brown # => "quick brown" lazy # => "lazy" 

But they lead to wonder where these variables were initialized and assigned; I am sure that I do not look in regular expressions for those that occur, so it potentially gets confused with others and again becomes a maintenance problem.


Carey says:

To tell a little about the named captures, if match_data = string.match / (?.?) Fox.? (?. *?) dog /, then match_data [: quick_brown] # => "quick brown" and match_data [: lazy] # => "lazy" (as well as quick_brown # => "quick brown" and lazy # => "lazy"). With the names available, I see no reason to use global variables or Regexp.last_match, etc.

Yes, but there is also a smell there.

We can use values_at with the MatchData result from match to get the values ​​captured, but there are some unintuitive behaviors in the class that disable me:

 /the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string)['lazy'] 

works and implies that MatchData knows how to behave like a hash:

 {'lazy' => 'dog'}['lazy'] # => "dog" 

and it has a values_at method like Hash, but it doesn't work intuitively:

 /the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at('lazy') # => # ~> -:6:in `values_at': no implicit conversion of String into Integer (TypeError) 

While:

 /the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at(2) # => ["lazy"] 

which now acts like an array:

 ['all captures', 'quick brown', 'lazy'].values_at(2) # => ["lazy"] 

I want a consistency and it makes my head hurt.

+2
source

All Articles