Ruby on Rails regexp equals-tilde vs. array includes to check the list of options

I am using Rails 3.2.3 with Ruby 1.9.3p0.

I find that I often have to determine if a string occurs in the parameter list. It seems I can use the Ruby .include method array:

 <% if ['todo','pending','history'].include?(params[:category]) %> 

or the equals-tilde short shorthand regex with vertical stripes separating options:

 <% if params[:category] =~ /todo|pending|history/ %> 

Is better than the other in terms of performance?

Is there an even better approach?

+4
source share
2 answers

Summary : Array#include? with String elements, it wins for both accepted and rejected inputs, for your example with three valid values. For a larger set for validation, does this look like Set#include? with String that elements can win.


How to test

We must verify this empirically.

Here are a few alternatives that you can also consider: a precompiled regular expression, a list of characters, and Set with String elements.

I would suggest that performance may also depend on whether most of your inputs fall into the expected set and whether or most of them are outside the set and rejected.

Here's an empirical test script:

 require 'benchmark' require 'set' strings = ['todo','pending','history'] string_set = Set.new(strings) symbols = strings.map(&:to_sym) regex_compiled = Regexp.new(strings.join("|")) strings_avg_size = (strings.map(&:size).inject {|sum, n| sum + n}.to_f / strings.size).to_i num_inputs = 1_000_000 accepted_inputs = (0...num_inputs).map { strings[rand(strings.size)] } rejected_inputs = (0...num_inputs).map { (0..strings_avg_size).map { ('a'...'z').to_a[rand(26)] }.join } Benchmark.bmbm(40) do |x| x.report("Array#include?, Strings, accepted:") { accepted_inputs.map {|s| strings.include?(s) } } x.report("Array#include?, Strings, rejected:") { rejected_inputs.map {|s| strings.include?(s) } } x.report("Array#include?, Symbols, accepted:") { accepted_inputs.map {|s| symbols.include?(s.to_sym) } } x.report("Array#include?, Symbols, rejected:") { rejected_inputs.map {|s| symbols.include?(s.to_sym) } } x.report("Set#include?, Strings, accepted:") { accepted_inputs.map {|s| string_set.include?(s) } } x.report("Set#include?, Strings, rejected:") { rejected_inputs.map {|s| string_set.include?(s) } } x.report("Regexp#match, interpreted, accepted:") { accepted_inputs.map {|s| s =~ /todo|pending|history/ } } x.report("Regexp#match, interpreted, rejected:") { rejected_inputs.map {|s| s =~ /todo|pending|history/ } } x.report("Regexp#match, compiled, accepted:") { accepted_inputs.map {|s| regex_compiled.match(s) } } x.report("Regexp#match, compiled, rejected:") { rejected_inputs.map {|s| regex_compiled.match(s) } } end 

results

 Rehearsal --------------------------------------------------------------------------- Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.215099) Array#include?, Strings, rejected: 0.530000 0.010000 0.540000 ( 0.543898) Array#include?, Symbols, accepted: 0.330000 0.000000 0.330000 ( 0.337767) Array#include?, Symbols, rejected: 1.870000 0.050000 1.920000 ( 1.923155) Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.274774) Set#include?, Strings, rejected: 0.460000 0.000000 0.460000 ( 0.463925) Regexp#match, interpreted, accepted: 0.380000 0.000000 0.380000 ( 0.382060) Regexp#match, interpreted, rejected: 0.650000 0.000000 0.650000 ( 0.660775) Regexp#match, compiled, accepted: 1.130000 0.080000 1.210000 ( 1.220970) Regexp#match, compiled, rejected: 0.630000 0.000000 0.630000 ( 0.640721) ------------------------------------------------------------------ total: 6.600000sec user system total real Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.219060) Array#include?, Strings, rejected: 0.430000 0.000000 0.430000 ( 0.444911) Array#include?, Symbols, accepted: 0.340000 0.000000 0.340000 ( 0.341970) Array#include?, Symbols, rejected: 1.080000 0.000000 1.080000 ( 1.089961) Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.281270) Set#include?, Strings, rejected: 0.400000 0.000000 0.400000 ( 0.406181) Regexp#match, interpreted, accepted: 0.370000 0.000000 0.370000 ( 0.366931) Regexp#match, interpreted, rejected: 0.560000 0.000000 0.560000 ( 0.558652) Regexp#match, compiled, accepted: 0.920000 0.000000 0.920000 ( 0.915914) Regexp#match, compiled, rejected: 0.620000 0.000000 0.620000 ( 0.627620) 

conclusions

(see Summary above)

It makes sense to me, thinking that an array of characters will be very slow for rejected inputs, because each individual of these random strings must be interned in the character table before checking.

I didnโ€™t even think twice that compiled Regexp would work so badly, especially compared to Regexp, interpreted as a literal in code. Can anyone explain why this is so bad?

+8
source

@ ms-tg answer has good guidelines and answers your question well, as far as I know. I just wanted to add a little note: be careful with this, because these two parameters will not always have the same results:

 params = Hash.new keyword_array = ['todo','pending','history'] included = nil params[:category] = "history plus other text" start_time = Time.now 1000.times do included = keyword_array.include?(params[:category]) end puts "Array.include? returned #{included} in #{(Time.now - start_time)*1000}ms" start_time = Time.now 1000.times do included = (params[:category] =~ /todo|pending|history/).is_a?(Integer) end puts "Regexp returned #{included} in #{(Time.now - start_time)*1000}ms" 

Return:

Array.include? false false at 0.477ms

Regexp will return to 0.953ms

Note that in this case the regex returns true , but array.include? returns false . This should be considered when building your logic.

Basically, if the string is not in the array exactly, array.include? will be false, but if one of the keywords is anywhere in the line, the regular expression will be true (regardless of whether there is another text).

+1
source

All Articles