Binary search in std :: vector

I am trying to find the position of vector elements in another vector. Here, I am interested in using an implementation as fast as binary search . I have different vectors of 1 million or more in length, so I'm trying to achieve something faster.

The following situations are in my case:

1) vector in which I am looking is sorted.

2) The element I'm looking for will always be there, i.e. I have no case of not found , and I would like to get the index of vector elements faster,

I tried the following code to get the index of vector elements.

 #include <iostream> #include <vector> #include <algorithm> template<class Iter, class T> Iter binary_find(Iter begin, Iter end, T val) { Iter i = std::lower_bound(begin, end, val); return i; } int main() { std::vector<std::string> values = {"AAAAAA","AB", "AD" ,"BCD","CD", "DD" }; std::vector<std::string> tests = {"AB", "CD","AD", "DD"}; for(int i=0 ; i < tests.size(); i++) { int pos = binary_find(values.begin(), values.end(), tests.at(i))- values.begin(); std::cout << tests.at(i) << " found at: " << pos <<std::endl; } return 0; } 

I would like to know if the binary search implementation code matches.

Is there a faster way to get the index of vector elements?

Other suggestions for improving this code.

+5
source share
3 answers

binary_find does not return anything, although it is not declared to return void , so it has undefined behavior.

After correcting it and assuming that you do not have specific knowledge about the contents of the vector, except for sorting it, the binary search is largely optimal.

However, there are other data structures that are faster for predicate search than a vector. If performance is critical, you should take a look at search trees and hash maps. Since your keys are strings, attempts and a directed acyclic graph of words, in particular, can be effective. You might want to measure what works best for your use case.

+4
source

http://www.cpluplus.com says that the behavior of binary_search equivalent:

 template <class ForwardIterator, class T> bool binary_search (ForwardIterator first, ForwardIterator last, const T& val) { first = std::lower_bound(first, last, val); return (first != last && !(val < *first)); } 

So lower_bound is your weapon of choice. But when you take the difference, you have to use distance . The reason is, if there is a faster way to get a position, it will be turned over to this function.

As for the other improvements, I would suggest using C ++ 14 begin and end and not call a function that only serves to wrap lower_bound (and cannot correctly return the value.) Thus, the way I would write this code would look So:

 auto pos = distance(lower_bound(begin(values), end(values), tests[i]), begin(values)); 
+2
source

Q1: I would like to know if the binary search implementation code matches.

Yes, it is ( almost ). Check out std :: lower_bound which says:

Complexity:

On average, the logarithmic distance between the first and last: Performs an approximate comparison of the elements log2 (N) +1 (where N is this distance). On iterators without random access, the iterator advances to produce additional linear complexity on average over N.


Q2: Is there a faster way to get the index of vector elements.

This is a fairly broad question.


Q3: Any additional suggestions for improving this code.

Hello world, code review !


PS - Have you even compiled the code? It gives some messages, for example:

 warning: no return statement in function returning non-void [-Wreturn-type] 

Compile with warnings enabled, for example:

 g++ -Wall main.cpp 
+1
source

All Articles