Given the word and text, we need to return the occurrences of anagrams

Question

Given the word and text, we need to return the occurrences of anagrams

Given the word and text, return the number of occurrences of the word anagrams in the text. E.g. the word "for", and the text "forxxorfxdofr", the anagrams "for" will be "ofr", "orf", "fro", etc. So for this particular example there will be 3.

I have a brute force approach that gets all the permutations of a word, then compares whether the text contains text and increases the number of occurrences, but this is an O (N ^ 2) approach. I am looking for the best difficulty.

+8

c ++ algorithm

Ahmed saleh 15 Sep '13 at 10:49

source share

4 answers

The TooTone O (n) solution has the need to compare two vectors of 256 elements for each character of the input text. This can be avoided by tracking the number of positions at which the two vectors differ, and registering a match when that number vanishes. In fact, we don’t even need to store two different vectors at all, since we can just save one vector containing their difference.

Here my version implements these optimizations. It is written in plain old C, but should work under C ++ with the appropriate settings:

 #include <stdio.h> #include <limits.h> /* for UCHAR_MAX (usually 255) */ int find_anagrams (char *word, char *text) { int len = 0; /* length of search word */ int bin[UCHAR_MAX+1]; /* excess count of each char in last len chars of text */ int mismatch = 0; /* count of nonzero values in bins[] */ int found = 0; /* number of anagrams found */ int i; /* generic loop counter */ /* initialize bins */ for (i = 0; i <= UCHAR_MAX; i++) bin[i] = 0; for (i = 0; word[i] != '\0'; i++) { unsigned char c = (unsigned char) word[i]; if (bin[c] == 0) mismatch++; bin[c]--; len++; /* who needs strlen()? */ } /* iterate through text */ for (i = 0; text[i] != '\0'; i++) { /* add next char in text to bins, keep track of mismatch count */ unsigned char c = (unsigned char) text[i]; if (bin[c] == 0) mismatch++; if (bin[c] == -1) mismatch--; bin[c]++; /* remove len-th previous char from bins, keep track of mismatch count */ if (i >= len) { unsigned char d = (unsigned char) text[i - len]; if (bin[d] == 0) mismatch++; if (bin[d] == 1) mismatch--; bin[d]--; } /* if mismatch count is zero, we've found an anagram */ if (mismatch == 0) { found++; #ifdef DEBUG /* optional: print each anagram found */ printf("Anagram found at position %d: \"", i-len+1); fwrite(text+i-len+1, 1, len, stdout); printf("\"\n"); #endif } } return found; } int main (int argc, char *argv[]) { if (argc == 3) { int n = find_anagrams(argv[1], argv[2]); printf("Found %d anagrams of \"%s\" in \"%s\".\n", n, argv[1], argv[2]); return 0; } else { fprintf(stderr, "Usage: %s <word> <text>\n", (argc ? argv[0] : "countanagrams")); return 1; } }

+3

Ilmari karonen 15 Sep '13 at 18:08

source share

Essentially, you can shift the window of the length of your word above your input and count the number of each letter in the window. When a letter is counted in your sliding window matching the letters of your word, you have a match.

Let the word length be n , and your current position is curr . Create an array or vector , windCounts length 26. The record windCounts[i] stores the number of occurrences of the letter i ^{th of the} alphabet, visible from the position curr - n - 1 , on curr .

What you do is move curr and update the windCounts array by decreasing the letter dropped out of the back of the sliding window and increasing the number of letters appearing in the front sliding window. (Obviously, to curr > n , you are only increasing, you are simply expanding your sliding window to the length of your word.)

In C ++, you can use vector to count the letters in a word and to count the letters in a sliding window and just use vector::operator== to do the equality.

Edit: O(N) algorithm, where n is the length of the text to search. This can be seen from the code below, where the body of the loop is executed for each letter that you insert into the window.

 #include <string> #include <vector> #include <algorithm> // for_each using std::string; using std::vector; #include <iostream> int main(int argc, char* argv[]) { const string text = "forxxorfxdofr"; const string word = "for"; // Counts of letters in word vector<int> wordCounts(256); // optimization: cut down from 256 to 26 std::for_each(word.begin(), word.end(), [&] (char c) { wordCounts[c]++; } ); // Current position of end of sliding window string::const_iterator curr = text.begin() + word.size(); // Initial sliding window counts vector<int> windCounts(256); std::for_each(text.begin(), curr, [&] (char c) { windCounts[c]++; } ); // Run sliding window over text int numMatches = 0; while (1) { numMatches += wordCounts == windCounts; if (curr == text.end()) { break; } windCounts[*(curr - word.size())]--; windCounts[*curr]++; ++curr; } std::cout << numMatches << "\n"; return 0; }

+2

Tooone 15 Sep '13 at 11:10

source share

I took two lines, namely str and occ. Str is the original string, and occ is the bite for which we must find the score. Using the strncpy function, I copied the length of occ ie n characters into the temp array, and then checked if this is a permutation of the string occ or not.

 #include<iostream.h> #include<conio.h> #include<string.h> int permutate(char str1[],char str2[]); int permutate(char str1[],char str2[]) { int c[256]={0},i,j; for(i=0;i<strlen(str1);i++) c[str1[i]]++; for(i=0;i<strlen(str2);i++) { c[str2[i]]--; if(c[str2[i]]<0) return 1; //not a permutation } return 0; //permutation } int main() { //enter code here char str[]="forxxorfxdofr",occ[]="for",temp[10]; int n,i,x,t=0; n=strlen(occ); for(i=0;i<strlen(str);i++) { strncpy(temp,str+i,n); //copy the n char from str to temp x=permutate(temp,occ); if(x==0) //if string is a permutation t++; } cout<<"Count = " << t; return 0; }

0

Taranjeet 20 sept '13 at 10:30

source share

Karoly Horvath · Accepted Answer · 2013-09-15T11:09:06+0000

You can simply search for the number of characters.

Say, for example, that you are looking for anagrams look . So you are looking for:

4 characters long word,
with 1 l, 2 o and 1 k.

Just process the first 4 letters, save the calculations. Check if you have a match. Add the next character (increment), delete the old character (decrement). Check again. And so on...

Given the word and text, we need to return the occurrences of anagrams

More articles: