Ruby - Compare two enumerators elegantly

I have two long streams of numbers coming from two different sources (binary data) in Ruby (1.9.2).

Two sources are encapsulated as two Enumerators .

I want to check that the two threads are exactly equal.

I came up with several solutions, but both of them seem rather inelegant.

The first just converts both to an array:

def equal_streams?(s1, s2) s1.to_a == s2.to_a end 

This works, but it is not very efficient in terms of memory, especially if the streams contain a lot of information.

Another option is ... tg.

 def equal_streams?(s1, s2) s1.each do |e1| begin e2 = s2.next return false unless e1 == e2 # Different element found rescue StopIteration return false # s2 has run out of items before s1 end end begin s2.next rescue StopIteration # s1 and s2 have run out of elements at the same time; they are equal return true end return false end 

So, is there an easier and more elegant way to do this?

+8
ruby enumerator
source share
5 answers

A little refactoring for your code, assuming your threads don't include the :eof element.

 def equal_streams?(s1, s2) loop do e1 = s1.next rescue :eof e2 = s2.next rescue :eof return false unless e1 == e2 return true if e1 == :eof end end 

Using a keyword of type loop should be faster than using a method of type each .

+8
source share

Comparing them one item at a time is probably the best thing you can do, but you can do it better than your "ugh" solution:

 def grab_next(h, k, s) begin h[k] = s.next rescue StopIteration end end def equal_streams?(s1, s2) loop do vals = { } grab_next(vals, :s1, s1) grab_next(vals, :s2, s2) return true if(vals.keys.length == 0) # Both of them ran out. return false if(vals.keys.length == 1) # One of them ran out early. return false if(vals[:s1] != vals[:s2]) # Found a mismatch. end end 

The hard part is the difference between one thread ending and ending. Bringing a StopIteration exception into a separate function and using the lack of a key in the hash is a pretty convenient way to do this. Just checking vals[:s1] will cause problems if your stream contains false or nil , but checking for a key solves this problem.

+6
source share

Here we do this by creating an alternative for Enumerable#zip , which works lazily and does not create an entire array. It combines my implementation of Closure interleave and the other two answers here (using a sentinel value to indicate the end of Enumerable was reached - the fact causing the problem is that next rewinds Enumerable after reaching the target).

This solution supports several parameters, so you can immediately compare n structures.

 module Enumerable # this should be just a unique sentinel value (any ideas for more elegant solution?) END_REACHED = Object.new def lazy_zip *others sources = ([self] + others).map(&:to_enum) Enumerator.new do |yielder| loop do sources, values = sources.map{|s| [s, s.next] rescue [nil, END_REACHED] }.transpose raise StopIteration if values.all?{|v| v == END_REACHED} yielder.yield values.map{|v| v == END_REACHED ? nil : v} end end end end 

So, when you have a zip option that works lazily and does not stop the iteration when the first enumerated ends, can you use all? or any? to actually check the relevant elements for equality.

 # zip would fail here, as it would return just [[1,1],[2,2],[3,3]]: p [1,2,3].lazy_zip([1,2,3,4]).all?{|l,r| l == r} #=> false # this is ok p [1,2,3,4].lazy_zip([1,2,3,4]).all?{|l,r| l == r} #=> true # comparing more than two input streams: p [1,2,3,4].lazy_zip([1,2,3,4],[1,2,3]).all?{|vals| # check for equality by checking length of the uniqued array vals.uniq.length == 1 } #=> false 
+2
source share

Following the discussion in the comments, here is a zip-based solution, the first version of the zip packaging block inside the Enumerator , and then used to compare the corresponding elements.

This works, but the extreme case has already been mentioned: if the first thread is shorter than the other, the remaining elements from the other will be discarded (see the example below).

I marked this answer as a community wiki, as other members could improve it.

 def zip_lazy *enums Enumerator.new do |yielder| head, *tail = enums head.zip(*tail) do |values| yielder.yield values end end end p zip_lazy(1..3, 1..4).all?{|l,r| l == r} #=> true p zip_lazy(1..3, 1..3).all?{|l,r| l == r} #=> true p zip_lazy(1..4, 1..3).all?{|l,r| l == r} #=> false 
+1
source share

Here is an example from 2 sources using a fiber / collaborative procedure. It is a little long, but very clear about his behavior, which is nice.

 def zip_verbose(enum1, enum2) e2_fiber = Fiber.new do enum2.each{|e2| Fiber.yield true, e2 } Fiber.yield false, nil end e2_has_value, e2_val = true, nil enum1.each do |e1_val| e2_has_value, e2_val = e2_fiber.resume if e2_has_value yield [true, e1_val], [e2_has_value, e2_val] end return unless e2_has_value loop do e2_has_value, e2_val = e2_fiber.resume break unless e2_has_value yield [false, nil], [e2_has_value, e2_val] end end def zip(enum1, enum2) zip_verbose(enum1, enum2) {|e1, e2| yield e1[1], e2[1] } end def self.equal?(enum1, enum2) zip_verbose(enum1, enum2) do |e1,e2| return false unless e1 == e2 end return true end 
0
source share

All Articles