Check if two hashes have the same key set

What is the most efficient way to check if two hashes h1 and h2 the same set of keys without considering an order? Could it be made faster or shorter with similar efficiency than the answer I am posting?

+6
source share
6 answers

Try:

 # Check that both hash have the same number of entries first before anything if h1.size == h2.size # breaks from iteration and returns 'false' as soon as there is a mismatched key # otherwise returns true h1.keys.all?{ |key| !!h2[key] } end 

List # all?

worst case scenario, you will execute only once with the keys.

+5
source

Well, let them break all the rules of vitality and tolerance. The MRI C API starts.

 /* Name this file superhash.c. An appropriate Makefile is attached below. */ #include <ruby/ruby.h> static int key_is_in_other(VALUE key, VALUE val, VALUE data) { struct st_table *other = ((struct st_table**) data)[0]; if (st_lookup(other, key, 0)) { return ST_CONTINUE; } else { int *failed = ((int**) data)[1]; *failed = 1; return ST_STOP; } } static VALUE hash_size(VALUE hash) { if (!RHASH(hash)->ntbl) return INT2FIX(0); return INT2FIX(RHASH(hash)->ntbl->num_entries); } static VALUE same_keys(VALUE self, VALUE other) { if (CLASS_OF(other) != rb_cHash) rb_raise(rb_eArgError, "argument needs to be a hash"); if (hash_size(self) != hash_size(other)) return Qfalse; if (!RHASH(other)->ntbl && !RHASH(other)->ntbl) return Qtrue; int failed = 0; void *data[2] = { RHASH(other)->ntbl, &failed }; rb_hash_foreach(self, key_is_in_other, (VALUE) data); return failed ? Qfalse : Qtrue; } void Init_superhash(void) { rb_define_method(rb_cHash, "same_keys?", same_keys, 1); } 

Here is the Makefile.

 CFLAGS=-std=c99 -O2 -Wall -fPIC $(shell pkg-config ruby-1.9 --cflags) LDFLAGS=-Wl,-O1,--as-needed $(shell pkg-config ruby-1.9 --libs) superhash.so: superhash.o $(LINK.c) -shared $^ -o $@ 

An artificial, synthetic, and simplified example shows what follows.

 require 'superhash' require 'benchmark' n = 100_000 h1 = h2 = {a:5, b:8, c:1, d:9} Benchmark.bm do |b| # freemasonjson state of the art. b.report { n.times { h1.size == h2.size and h1.keys.all? { |key| !!h2[key] }}} # This solution b.report { n.times { h1.same_keys? h2} } end # user system total real # 0.310000 0.000000 0.310000 ( 0.312249) # 0.050000 0.000000 0.050000 ( 0.051807) 
+7
source

Combining freemasonjson's and:

 h1.size == h2.size and (h1.keys - h2.keys).empty? 
+6
source

Just to have at least a reference question on this issue ...

 require 'securerandom' require 'benchmark' a = {} b = {} # Use uuid to get a unique random key (0..1_000).each do |i| key = SecureRandom.uuid a[key] = i b[key] = i end Benchmark.bmbm do |x| x.report("#-") do 1_000.times do (a.keys - b.keys).empty? and (a.keys - b.keys).empty? end end x.report("#&") do 1_000.times do computed = a.keys & b.keys computed.size == a.size end end x.report("#all?") do 1_000.times do a.keys.all?{ |key| !!b[key] } end end x.report("#sort") do 1_000.times do a_sorted = a.keys.sort b_sorted = b.keys.sort a == b end end end 

Results:

 Rehearsal ----------------------------------------- #- 1.000000 0.000000 1.000000 ( 1.001348) #& 0.560000 0.000000 0.560000 ( 0.563523) #all? 0.240000 0.000000 0.240000 ( 0.239058) #sort 0.850000 0.010000 0.860000 ( 0.854839) -------------------------------- total: 2.660000sec user system total real #- 0.980000 0.000000 0.980000 ( 0.976698) #& 0.560000 0.000000 0.560000 ( 0.559592) #all? 0.250000 0.000000 0.250000 ( 0.251128) #sort 0.860000 0.000000 0.860000 ( 0.862857) 

I have to agree with @akuhn that this would be a better benchmark if we had more information about the dataset you are using. But, as they say, I believe that this issue really needs some serious fact.

+4
source

It depends on your data.

In fact, there is no common cause. For example, usually getting the whole set of keys at once is faster than checking the inclusion of each key separately. However, if in your dataset the keys are most often different from each other, then a slower solution that does not work faster can be faster. For instance:

 h1.size == h2.size and h1.keys.all?{|k|h2.include?(k)} 

Another factor to consider is the size of your hashes. If they are large, a solution with a higher installation cost, for example, calling Set.new , may pay off; if they are small, it will not:

 h1.size == h2.size and Set.new(h1.keys) == Set.new(h2.keys) 

And if you manage to compare the same immutable hashes again and again, it will surely pay off the results.

In the end, only the benchmark will say, but to write a benchmark, we will need to know more about your use case. Of course, testing a solution with synthetic data (for example, randomly generated keys) will not be representative.

+3
source

This is my attempt:

 (h1.keys - h2.keys).empty? and (h2.keys - h1.keys).empty? 
+1
source

All Articles