Bit operation to search for line differences

My next line tried to find the difference between the two lines. But it's terribly slow as it iterates over the length of the string:

#include <string> #include <vector> #include <iostream> using namespace std; int hd(string s1, string s2) { // hd stands for "Hamming Distance" int dif = 0; for (unsigned i = 0; i < s1.size(); i++ ) { string b1 = s1.substr(i,1); string b2 = s2.substr(i,1); if (b1 != b2) { dif++; } } return dif; } int main() { string string1 = "AAAAA"; string string2 = "ATATT"; string string3 = "AAAAA"; int theHD12 = hd(string1,string2); cout << theHD12 << endl; int theHD13 = hd(string1,string3); cout << theHD13 << endl; } 

Is there a quick alternative? In Perl, we can have the following approach:

 sub hd { return ($_[0] ^ $_[1]) =~ tr/\001-\255//; } 

which is much more than an iteration of the position.

Interestingly, what is its equivalent in C ++?

+6
c ++ string
source share
6 answers

Fun with STL:

 #include <numeric> //inner_product #include <functional> //plus, equal_to, not2 #include <string> #include <stdexcept> unsigned int hd(const std::string& s1, const std::string& s2) { // TODO: What should we do if s1.size() != s2.size()? if (s1.size() != s2.size()){ throw std::invalid_argument( "Strings passed to hd() must have the same lenght" ); } return std::inner_product( s1.begin(), s1.end(), s2.begin(), 0, std::plus<unsigned int>(), std::not2(std::equal_to<std::string::value_type>()) ); } 
+8
source share

Try replacing the for loop with:

 for (unsigned i = 0; i < s1.size(); i++ ) { if (b1[i] != b2[i]) { dif++; } } 

This should be much faster because newlines are not created.

+10
source share

Use iterators:

 int GetHammingDistance(const std::string &a, const std::string &b) { // Hamming distance is not defined for strings of different lengths. ASSERT(a.length() == b.length()); std::string::const_iterator a_it = a.begin(); std::string::const_iterator b_it = b.begin(); std::string::const_iterator a_end = a.end(); std::string::const_iterator b_end = b.end(); int distance = 0; while (a_it != a_end && b_it != b_end) { if (*a_it != *b_it) ++distance; ++a_it; ++b_it; } return distance; } 
+3
source share

Option 1: Change your source code as much as possible.

 int hd(string const& s1, string const& s2) { // hd stands for "Hamming Distance" int dif = 0; for (std::string::size_type i = 0; i < s1.size(); i++ ) { char b1 = s1[i]; char b2 = s2[i]; dif += (b1 != b2)?1:0; } return dif; } 

The second option uses some STL algorithms for heavy lifting.

 struct HammingFunc { inline int operator()(char s1,char s2) { return s1 == s2?0:1; } }; int hd(string const& s1, string const& s2) { int diff = std::inner_product(s1.begin(),s1.end(), s2.begin(), 0, std::plus<int>(),HammingFunc() ); return diff; } 
+3
source share

Some obvious points that can make this faster:

  • Pass strings as const references, not by value
  • Use the index operator [] to get characters, not method calls
  • Compile with optimization on
+2
source share

You use strings.

As explained here Hunting for the fastest implementation of Hamming Distance C if you can use char *, my experiments conclude that for Gcc 4.7.2 on Intel Xeon X5650 the fastest function of calculating the distance from hammming for small (char arrays):

 // na = length of both strings unsigned int HammingDistance(const char* a, unsigned int na, const char* b) { unsigned int num_mismatches = 0; while (na) { if (*a != *b) ++num_mismatches; --na; ++a; ++b; } return num_mismatches; } 

If your problem allows you to set the upper limit of the distance, so that you do not need large distances, and this limit is always less than the length of the lines, the above example can be further optimized for:

 // na = length of both strings, dist must always be < na unsigned int HammingDistance(const char* const a, const unsigned int na, const char* const b, const unsigned int dist) { unsigned int i = 0, num_mismatches = 0; while(i <= dist) { if (a[i] != b[i]) ++num_mismatches; ++i; } while(num_mismatches <= dist && i < na) { if (a[i] != b[i]) ++num_mismatches; ++i; } return num_mismatches; } 

I'm not sure that const does anything regarding speed, but I use it anyway ...

+1
source share

All Articles