Summary : Array#include? with String elements, it wins for both accepted and rejected inputs, for your example with three valid values. For a larger set for validation, does this look like Set#include? with String that elements can win.
How to test
We must verify this empirically.
Here are a few alternatives that you can also consider: a precompiled regular expression, a list of characters, and Set with String elements.
I would suggest that performance may also depend on whether most of your inputs fall into the expected set and whether or most of them are outside the set and rejected.
Here's an empirical test script:
require 'benchmark' require 'set' strings = ['todo','pending','history'] string_set = Set.new(strings) symbols = strings.map(&:to_sym) regex_compiled = Regexp.new(strings.join("|")) strings_avg_size = (strings.map(&:size).inject {|sum, n| sum + n}.to_f / strings.size).to_i num_inputs = 1_000_000 accepted_inputs = (0...num_inputs).map { strings[rand(strings.size)] } rejected_inputs = (0...num_inputs).map { (0..strings_avg_size).map { ('a'...'z').to_a[rand(26)] }.join } Benchmark.bmbm(40) do |x| x.report("Array#include?, Strings, accepted:") { accepted_inputs.map {|s| strings.include?(s) } } x.report("Array#include?, Strings, rejected:") { rejected_inputs.map {|s| strings.include?(s) } } x.report("Array#include?, Symbols, accepted:") { accepted_inputs.map {|s| symbols.include?(s.to_sym) } } x.report("Array#include?, Symbols, rejected:") { rejected_inputs.map {|s| symbols.include?(s.to_sym) } } x.report("Set#include?, Strings, accepted:") { accepted_inputs.map {|s| string_set.include?(s) } } x.report("Set#include?, Strings, rejected:") { rejected_inputs.map {|s| string_set.include?(s) } } x.report("Regexp#match, interpreted, accepted:") { accepted_inputs.map {|s| s =~ /todo|pending|history/ } } x.report("Regexp#match, interpreted, rejected:") { rejected_inputs.map {|s| s =~ /todo|pending|history/ } } x.report("Regexp#match, compiled, accepted:") { accepted_inputs.map {|s| regex_compiled.match(s) } } x.report("Regexp#match, compiled, rejected:") { rejected_inputs.map {|s| regex_compiled.match(s) } } end
results
Rehearsal --------------------------------------------------------------------------- Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.215099) Array#include?, Strings, rejected: 0.530000 0.010000 0.540000 ( 0.543898) Array#include?, Symbols, accepted: 0.330000 0.000000 0.330000 ( 0.337767) Array#include?, Symbols, rejected: 1.870000 0.050000 1.920000 ( 1.923155) Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.274774) Set#include?, Strings, rejected: 0.460000 0.000000 0.460000 ( 0.463925) Regexp#match, interpreted, accepted: 0.380000 0.000000 0.380000 ( 0.382060) Regexp#match, interpreted, rejected: 0.650000 0.000000 0.650000 ( 0.660775) Regexp#match, compiled, accepted: 1.130000 0.080000 1.210000 ( 1.220970) Regexp#match, compiled, rejected: 0.630000 0.000000 0.630000 ( 0.640721) ------------------------------------------------------------------ total: 6.600000sec user system total real Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.219060) Array#include?, Strings, rejected: 0.430000 0.000000 0.430000 ( 0.444911) Array#include?, Symbols, accepted: 0.340000 0.000000 0.340000 ( 0.341970) Array#include?, Symbols, rejected: 1.080000 0.000000 1.080000 ( 1.089961) Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.281270) Set#include?, Strings, rejected: 0.400000 0.000000 0.400000 ( 0.406181) Regexp#match, interpreted, accepted: 0.370000 0.000000 0.370000 ( 0.366931) Regexp#match, interpreted, rejected: 0.560000 0.000000 0.560000 ( 0.558652) Regexp#match, compiled, accepted: 0.920000 0.000000 0.920000 ( 0.915914) Regexp#match, compiled, rejected: 0.620000 0.000000 0.620000 ( 0.627620)
conclusions
(see Summary above)
It makes sense to me, thinking that an array of characters will be very slow for rejected inputs, because each individual of these random strings must be interned in the character table before checking.
I didnโt even think twice that compiled Regexp would work so badly, especially compared to Regexp, interpreted as a literal in code. Can anyone explain why this is so bad?