I have a problem with efficiency and algorithms when it comes to determining the difference between two very large arrays. I hope someone with a good understanding of the algorithms can point me in the right direction how to solve this, since my current implementations take a lot of time.
Problem:
I have two very large arrays. One contains a list of emails with invalid domain names, and the other contains a mixed list, which I need to check against the first array.
accounts_with_failed_email_domains = [279,000 records in here] unchecked_account_domains = [149,000 records in here]
What I need to do is look at the list of unchecked_account_domains and then compare each entry to see if there is a match in accounts_with_failed_email_domains . I need to insert all matches between lists in a separate array, which will be processed later.
How can I effectively write something that can be quickly verified through these accounts. Here is what I have tried so far.
unchecked_account_domains = [really big array] unchecked_account_domains = unchecked_account_domains.sort accounts_with_failed_email_domains = [another huge array].sort unchecked_account_domains.keep_if do |email| accounts_with_failed_email_domains.any? { |failed_email| email == failed_email } end
This implementation above works forever. Here is the second attempt, which still turned out to be no better.
unchecked_account_domains = [really big array] unchecked_account_domains = unchecked_account_domains.sort accounts_with_failed_email_domains = [another huge array].sort unchecked_account_domains.each do |email| accounts_with_failed_email_domains.bsearch do |failed_email| final_check << email if email == failed_email end end
bsearch seemed promising, but I'm sure I am not using it correctly. Also, I tried to consider this question comparing large lists , but this is in python and I cannot find the Ruby equivalent for set . Does anyone have any ideas on how to solve this?
source share