The effective or fast size of a given intersection of two vectors

I need to return the intersection size of two vectors:

std::vector<int> A_, B_ 

I don't need intersecting values, just the size of the set. This function must be called a very large number of times. This is part of a stronger math / network simulation.

My working conditions:

  • Containers are vectors. To change them is pure pain, but of course, do it if you win it.
  • The size of A_ and B_ has an upper bound of ~ 100. But often less.
  • Elements A_ and B_ represent samples taken from {1,2, ..., M}, where M> 10000.
  • In general, A_ and B_ have similar but unequal sizes.
  • Both vectors are disordered.
  • The contents of A_ and B_ change as part of a โ€œlarger simulationโ€.
  • Each vector contains only unique elements, i.e. not repeated.

My first attempt using a naive loop is given below. But I think this may not be enough. I suggested that std :: set_intersection would be too burdensome due to repeated sorts and distributions.

  int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_) { int c_count=0; for(std::vector<int>::const_iterator it = A_.begin(); it != A_.end(); ++it){ for(std::vector<int>::const_iterator itb = B_.begin(); itb != B_.end(); ++itb){ if(*it==*itb) ++c_count; } } return c_count; } 

Given my conditions above, how else can I implement this to get speed, relatively easily? Should I think of hash tables or compile with sorts and STLs or different containers?

+7
c ++ performance vector stl intersection
source share
2 answers

Your algorithm is O (n 2 ) in the number of elements (provided that the size of both vectors is approximately equal to n ). Here is the O (n) algorithm:

  • Create std::unordered_set<int>
  • Put all elements of vector A in a set.
  • Go through all the elements of the vector B , checking that they are present in the unordered_set , and increasing the number for each element that is present.
  • Returns the final score.

Here is an implementation in C ++ 11, using lambda for short:

 vector<int> a {2, 3, 5, 7, 11, 13}; vector<int> b {1, 3, 5, 7, 9, 11}; unordered_set<int> s(a.begin(), a.end()); int res = count_if(b.begin(), b.end(), [&](int k) {return s.find(k) != s.end();}); // Lambda above captures the set by reference. count_if passes each element of b // to the lambda. The lambda returns true if there is a match, and false otherwise. 

(this prints 4 ; demo )

+13
source share

Your algorithm is O (n * m), where n and m are the number of elements in the vectors.

If you have no problems when the input is untrusted, you are likely to get the best results with:

  • Put all elements of A in unordered_set
  • For each item in B , if it is in the set, increase the counter.

For example:

 int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_) { std::unordered_set<int> aSet(A_.cbegin(), A_.cend()); return std::count_if(B_.cbegin(), B_.cend(), [&](int element) { return aSet.find(element) != aSet.end(); }); } 

This will likely give the results O (m + n). (Hash tables are almost always O (1), but if an attacker can cause many collisions in the table, they can cause O (n) behavior to lead to a denial of service)

If you need deterministic results, and the order of the vectors does not matter, sorting a single vector will work, which will only be O (m lg m + m + n). I.e:

  • Sort the first vector
  • For each element in the second vector, use a binary search to determine if the element is in the first vector, and if so, increase your counter.

For example:

 int vec_intersect(std::vector<int>& A_, const std::vector<int>& B_) { std::sort(A_.begin(), A_.end()); return std::count_if(B_.cbegin(), B_.cend(), [&](int element) { return std::binary_search(A_.cbegin(), A_.cend(), element); }); } 

Just for a giggle, here's a <algorithm> -developed version of your algorithm:

 int vec_intersect(const std::vector<int>& A_, const std::vector<int>& B_) { return std::count_if(B_.cbegin(), B_.cend(), [&](int element) { return std::find(A_.cbegin(), A_.cend(), element) != A_.cend(); }); } 
+3
source share

All Articles