Determine if A is a permutation of B using ASCII values

I wrote a function to determine if row a is a permutation of row b . The definition is as follows:

 bool isPermutation(std::string a, std::string b){ if(a.length() != b.length()) return false; int a_sum, b_sum; a_sum = b_sum = 0; for(int i = 0; i < a.length(); ++i){ a_sum += a.at(i); b_sum += b.at(i); } return a_sum == b_sum; } 

The problem with my approach is that if a = 600000 and b = 111111 , the function returns true.

Is there a way to keep the general approach to this problem (as opposed to sorting strings that execute strcmp ) and maintain correctness?

+2
c ++ c ++ 11 permutation
source share
3 answers

You can count the characters separately:

 bool isPermutation(std::string a, std::string b) { if(a.length() != b.length()) return false; assert(a.length() <= INT_MAX); assert(b.length() <= INT_MAX); int counts[256] = {}; for (unsigned char ch : a) ++counts[ch]; for (unsigned char ch : b) --counts[ch]; for (int count : counts) if (count) return false; return true; } 
+3
source share

A simple approach if you do not need UTF-8 support

The solution to this problem is surprisingly easy. There is a function in the standard library.

Suppose a and b are two string s:

 return is_permutation(a.begin(), a.end(), b.begin(), b.end()); 

Or, if you don't have access to C ++ 14 yet:

 return a.size() == b.size() && is_permutation(a.begin(), a.end(), b.begin()); 

Note that the complexity of this is guaranteed no worse than a quadratic-sized string. So, if that matters, sorting both strings might really be the best solution:

 string aa(a); sort(aa.begin(), aa.end()); string bb(b); sort(bb.begin(), bb.end()); return (aa == bb); 

And if it also slows down, use John Zwink's answer above, which is linear in complexity.

Documentation link for is_permutation : http://en.cppreference.com/w/cpp/algorithm/is_permutation

Documentation link for sort : http://en.cppreference.com/w/cpp/algorithm/sort

A (slightly) more sophisticated approach if UTF-8 support is required

The above may not work on UTF-8 lines. The problem here is that UTF-8 is a multibyte character encoding, i.e. one character can be encoded in multiple char variables. None of the approaches mentioned above knows this, and everyone assumes that one character is also a sigle char variable. An example of two lines of UTF-8 were the following approaches: http://ideone.com/erfNmC

The solution might be to temporarily copy our UTF-8 string to a fixed-length UTF-32 encoded string. Suppose a and b are two UTF-8 encoded string s:

 u32string a32 = wstring_convert<codecvt_utf8<char32_t>, char32_t>{}.from_bytes(a); u32string b32 = wstring_convert<codecvt_utf8<char32_t>, char32_t>{}.from_bytes(b); 

Then you can correctly use the above functions for these UTF-32 encoded strings:

 return is_permutation(a32.begin(), a32.end(), b32.begin(), b32.end()) << '\n'; 

or

 sort(a32.begin(), a32.end()); sort(b32.begin(), b32.end()); return (aa == bb); 

The downside is that John Zwink's approach is now becoming slightly less practical. You need to declare an array for 1114112 elements, since this number of possible Unicode characters actually exists.

More about conversions in UTF-32: http://en.cppreference.com/w/cpp/locale/wstring_convert/from_bytes

+4
source share
 std::sort( strOne.begin(), strOne.end() ); std::sort( strTwo.begin(), strTwo.end() ); return strOne == strTwo; 

will be sufficient.


My suggestion is to use std::unordered_map

i.e.

 std::unordered_map< char, unsigned > umapOne; std::unordered_map< char, unsigned > umapTwo; for( char c : strOne ) ++umapOne[c]; for( char c : strTwo ) ++umapTwo[c]; return umapOne == umapTwo; 

As an optimization, you can add at the top for the solution

 if( strOne.size() != strTwo.size() ) return false; 

The best solution is std::unordered_map ,

 if( strOne.size() != strTwo.size() ) return false; // required std::unordered_map< char, int > umap; for( char c : strOne ) ++umap[c]; for( char c : strTwo ) if( --umap[c] < 0 ) return false; return true; 

If you just need to solve the problem without knowing how to do it, you can use std::is_permutation

 return std::is_permutation( strOne.begin(), strOne.end(), strTwo.begin(), strTwo.end() ); 
0
source share

All Articles