Can I make this Ruby code faster and / or use less memory?

I have Arrayobjects Stringin Ruby that consist of words like the ones below:

animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]

I want to convert this to another object Arrayobjects String, but with only one animal per element and only with unique elements. I found one way to do this as follows:

class Array
  def process()
    self.join(" ").split().uniq
  end
end

However, if the input array is huge, say, millions of records, then the performance of this will be pretty bad because I will create a huge string, then a huge array, and then uniqhave to process this huge array to remove duplicate elements. One of the ways I was thinking of speeding up is to create Hashwith a record for each word, so I will process each word only in the first pass. Is there a better way?

+4
source share
4 answers

You have the right idea. However, Ruby has a built-in class that is ideal for creating sets of unique elements: Set .

animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]

unique_animals = Set.new

animals.each do |str|
  unique_animals.merge(str.split)
end
# => cat
#    horse
#    dog
#    bird
#    sheep
#    chicken
#    cow

Or...

unique_animals = animals.reduce(Set.new) do |set, str|
  set.merge(str.split)
end

Set Hash , (each, map, select ..). , # to_a.

+6

( ), , , . , . : , Ruby . . , Ruby. Ruby, .

, "". , , Ruby uniq. ( ), , , . :

def process
  distincts = Hash.new
  self.each { |words| words.split.each { |word| distincts[word] = nil }}
  distincts.keys
end

, . , . Hash Set (, ,), . :

def process
  distincts = Hash.new
  self.each { |words| words.split.each { |word| distincts[word] = :present unless distincts[word] }}
  distincts.keys
end

, (, ). , , , , ( , ).

+3

?

for each element in [...]
  if the element does not contain spaces
    insert it into the result array
  else
    split it up and insert its parts in the next position ahead
  end
end

ruby:

class Array
  def process
    d = dup
    d.each_with_object([]).each_with_index do |(element, array), index|
      if !element.index " "
        array << element if !array.include? element
      else
        d.insert index+1, *(element.split)
      end
    end
  end
end

["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"].process
=> ["cat", "horse", "dog", "bird", "sheep", "chicken", "cow"]

:

  • (. )

:

  • , (- , )

, , join(" ").split().uniq ( ). , .

0
source

I tried various methods suggested by others here, but I came up with two that are faster than the others offered here, but not as fast as the original, unfortunately.

  # This one moves through the original Array using inject to process
  # each element containing space-separated words and appending them
  # to a new array.  Finally uniq is called to remove duplicate words
  def process_new_4
    self.inject([]) {
        |array, words|
      array.push(*words.split)
    }.uniq
  end

  # This one uses the flat_map method of Array to flatten itself, each
  # element is split in case it contains more than one word, then the
  # flattened array has duplicate elements removed with uniq
  def process_new_3
    self.flat_map(&:split).uniq
  end
0
source

All Articles