Check out my pastime interview anagram code

Some time ago, several questions related to the interview were asked, and I gasped so much from the basic syntax that I was unable to advance (as soon as the adrenaline starts, the encoding comes out of the window.)

Given a list of strings, return a list of sets of strings that are anagrams of the input dataset. those. "dog", "god", "foo" must return {"dog", "god"}. After that, I created the code myself, as a health check, and that was already a bit. I would welcome him to see if I missed something or could have done it much more efficiently. Take it as a chance to improve yourself and learn other methods:

void Anagram::doWork(list input, list> &output) { typedef list < pair < string, string>> SortType; SortType sortedInput; 
// sort each string and pair it with the original for(list< string >::iterator i = input.begin(); i != input.end(); ++i) { string tempString(*i); std::sort(tempString.begin(), tempString.end()); sortedInput.push_back(make_pair(*i, tempString)); }
// Now step through the new sorted list for(SortType::iterator i = sortedInput.begin(); i != sortedInput.end();) { set< string > newSet;
// Assume (hope) we have a match and pre-add the first. newSet.insert(i->first);
// Set the secondary iterator one past the outside to prevent // matching the original SortType::iterator j = i; ++j;
while(j != sortedInput.end()) { if(i->second == j->second) { // If the string matches, add it to the set and remove it // so that future searches need not worry about it newSet.insert(j->first); j = sortedInput.erase(j); } else { // else, next element ++j; } }
// If size is bigger than our original push, we have a match // - save it to the output if(newSet.size() > 1) { output.push_back(newSet); }
// erase this element and update the iterator i = sortedInput.erase(i); } }

Here, the second pass at the same time after looking at comments and training a little more:

 void doBetterWork(list input, list> &output) { typedef std::multimap< string, string > SortedInputType; SortedInputType sortedInput; vector< string > sortedNames; 
for(vector< string >::iterator i = input.begin(); i != input.end(); ++i) { string tempString(*i); std::sort(tempString.begin(), tempString.end()); sortedInput.insert(make_pair(tempString, *i)); sortedNames.push_back(tempString); }
for(list< string >::iterator i = sortedNames.begin(); i != sortedNames.end(); ++i) { pair< SortedInputType::iterator,SortedInputType::iterator > bounds; bounds = sortedInput.equal_range(*i);
set< string > newSet; for(SortedInputType::iterator j = bounds.first; j != bounds.second; ++j) { newSet.insert(j->second); }
if(newSet.size() > 1) { output.push_back(newSet); }
sortedInput.erase(bounds.first, bounds.second); } }

+6
c ++ anagram
source share
4 answers

The best way to group anagrams is by matching strings with some kind of histogram representation.

  FUNCTION histogram [input] -> [output] "dog" -> (1xd, 1xg, 1xo) "god" -> (1xd, 1xg, 1xo) "foo" -> (1xf, 2xo) 

Basically, with a linear scan of a string, you can create a histogram view of how much of each letter it contains. The small, finite alphabet makes it even easier (for example, with AZ , you only have an array of 26 numbers, one for each letter).

Now, anagrams are just words that have the same histogram.

Then you can have a multi-map data structure that maps the histogram to a list of words that have this histogram.

 MULTIMAP [key] => [set of values] (1xd, 1xg, 1xo) => { "dog", "god" } (1xf, 2xo) => { "foo" } 

Canonical trick form

Instead of working with histograms, you can also work on the "canonical form" of strings. Basically, you determine for each line what its canonical form is, and two words are anagrams if they have the same canonical form.

One convenient canonical form is to have letters in a string in sorted order.

 FUNCTION canonize [input] -> [output] "dog" -> "dgo" "god" -> "dgo" "abracadabra" -> "aaaaabbcdrr" 

Note that this is just one step after approaching the histogram: you essentially do a sort count to sort the letters.

This is the most practical solution in a programming language for your problem.

Complexity

The product of the histogram / canonical form of the word is practically O(1) (final alphabet size, final maximum word length).

With a good implementation of the hash get and put in the multimagel O(1) .

You can even have multiple multi-shots, one for each word length.

If there are words N , therefore all words in multimars, therefore, O(N) ; then the output of each group of anagrams simply resets the values ​​in the multi-mars. This can also be done in O(N) .

This is certainly better than checking if each pair of words is anagram ( O(N^2) algorithm).

+14
source share

I just see it as a free get_anagrams function, since the class Anagram does nothing else.

Choosing list and set no better than anything else, so while I am in, I will make them like vector . In fact, the output can be only one flat sequence. So name it anagram_sort . I will take the specification of "list" and "install" as the base constructs, not the letter class templates. Also "given ... return" may be poorly interpreted as a change of input in place. The caller can create a copy if he wants.

 struct anagram_less { bool operator()( string const &l, string const &r ) { string sl( l ), sr( r ); sort( sl.begin(), sl.end() ); sort( sr.begin(), sr.end() ); return sl < sr; } }; void anagram_sort( vector< string > &v ) { // "the answer" sort( v.begin(), v.end(), anagram_less ); } // 10 lines // usage: void print_anagrams( vector< string > &&v ) { // destructive => use rvalue ref anagram_sort( v ); for ( vector< string >::iterator i = v.begin(); i != v.end(); /*++i*/ ) { vector< string >::iterator e // find end of run of anagrams = adjacent_find( i, v.end(), anagram_less() ); if ( e != v.end() ) ++ e; // usually need this after adjacent_find :v( cout << "( "; for ( ; i != e; ++ i ) { cout << * i << " "; } cout << ")" << endl; } } 

This is suboptimal as it re-sorts the words. I would rather cut the fat in the interview question than do something quickly "on paper", though.

To compress the last bit of performance, I would probably make a copy of the input with the serial number indices turned on, sort the line characters, sort the lines, and then use the indexes to change the order of vector input using this algorithm . The final run time is a low O (n log n) coefficient, as it should be.

+1
source share

The algorithm for checking whether two lines are anagrams of each other is as follows

  • convert two lines so that it contains only a letter (because the mother-in-law and the Hitler women are also anagrams). Also make sure that the two lines are equal, or both lines are in upper case or lower case.

  • now sort the characters in both lines.

  • compare two lines if they are equal means they are anagrams of each other

Here is the code for this algo

 bool checkanagrams(string first,string second) { string tempfirst,tempsecond; int countfirst = 0; int countsecond = 0; // only storing characters removing other junk things like - for(int i=0;i<first.length();i++) { if(isalpha(first[i]) { tempfirst[countfirst] =first[i]; countfirst++; } } for(int i=0;i<second.length();i++) { if(isalpha(second[i]) { tempsecond[countsecond] =second[i]; countsecond++; } } sort(tempfirst.begin(),tempfirst.end()); sort(tempsecond.begin(),tempsecond.end()); if(!strcmp(tempfirst.c_str(),tempsecond.c_str()) return true; else return false; } 
+1
source share

I would put everything on a hash table. Then for each row I return it and check if it exists in the hash table if it returns both the changed row and itself in the set.

This approach also finds palindromes in the kit.

0
source share

All Articles