First, be careful in terms of Perl when writing in Ruby. We are doing something in more detail to make the code more readable.
I would write my @extractArray = $string =~ m{the (.*?) fox .*?the (.*?) dog}; as:
string = "the quick brown fox jumps over the lazy dog." string[/the (.*?) fox .*?the (.*?) dog/] extract_array = $1, $2 # => ["quick brown", "lazy"]
Ruby, like Perl, knows capture groups and assigns them to the values $1 , $2 , etc. These make it very clean and understandable when capturing values ββand assigning them later. The regex mechanism allows you to create and assign named captures, but they usually hide what happens, so for clarity, I usually go this way.
We can use match to get there:
/the (.*?) fox .*?the (.*?) dog/.match(string)
but is the end result more readable?
extract_array = /the (.*?) fox .*?the (.*?) dog/.match(string)[1..-1] # => ["quick brown", "lazy"]
These captures are also interesting:
/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/ =~ string quick_brown # => "quick brown" lazy # => "lazy"
But they lead to wonder where these variables were initialized and assigned; I am sure that I do not look in regular expressions for those that occur, so it potentially gets confused with others and again becomes a maintenance problem.
Carey says:
To tell a little about the named captures, if match_data = string.match / (?.?) Fox.? (?. *?) dog /, then match_data [: quick_brown] # => "quick brown" and match_data [: lazy] # => "lazy" (as well as quick_brown # => "quick brown" and lazy # => "lazy"). With the names available, I see no reason to use global variables or Regexp.last_match, etc.
Yes, but there is also a smell there.
We can use values_at with the MatchData result from match to get the values ββcaptured, but there are some unintuitive behaviors in the class that disable me:
/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string)['lazy']
works and implies that MatchData knows how to behave like a hash:
{'lazy' => 'dog'}['lazy']
and it has a values_at method like Hash, but it doesn't work intuitively:
/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at('lazy')
While:
/the (?<quick_brown>.*?) fox .*?the (?<lazy>.*?) dog/.match(string).values_at(2)
which now acts like an array:
['all captures', 'quick brown', 'lazy'].values_at(2)
I want a consistency and it makes my head hurt.