To search for a specific Word from a string representation, you probably want to look at something like map . To create a simple concatenation of results, you probably want to set . This implementation is written more as a demonstration than as a highly desirable final implementation (cp.f. sloppy phrase).
#include <vector> #include <map> #include <set> #include <iostream> #include <string> typedef std::string IDdoc; typedef int position; typedef std::pair<IDdoc,position> Occurrence; typedef std::vector<Occurrence> OccurrencesOfWord; typedef std::map<std::string /*word*/, OccurrencesOfWord> Dictionary; typedef std::set<IDdoc> Matches; bool findMatchesForPhrase(const std::string& phrase, const Dictionary& dictionary, Matches& matches) { size_t pos = 0; size_t len = 0; while (pos < phrase.length()) { size_t end = phrase.find(' ', pos); size_t len = ((end == phrase.npos) ? phrase.length() : end) - pos; std::string word(phrase, pos, len); pos += len + 1; // to skip the space. // ignore words not in the dictionary. auto dictIt = dictionary.find(word); if (dictIt == dictionary.end()) continue; auto& occurrences = dictIt->second; // shortcut/alias,. for (auto& occurIt : occurrences) { // Add all the IDdoc of this occurence to the set. matches.insert(occurIt.first); } } return !matches.empty(); } void addToDictionary(Dictionary& dict, const char* word, const char* doc, int position) { dict[word].push_back(std::make_pair(std::string(doc), position)); } int main(int argc, const char** argv) { std::string phrase("pizza is life"); Dictionary dict; addToDictionary(dict, "pizza", "book1", 10); addToDictionary(dict, "pizza", "book2", 30); addToDictionary(dict, "life", "book1", 1); addToDictionary(dict, "life", "book3", 1); addToDictionary(dict, "goat", "book4", 99); Matches matches; bool result = findMatchesForPhrase(phrase, dict, matches); std::cout << "result = " << result << std::endl; for (auto& ent : matches) { std::cout << ent << std::endl; } return 0; }
Online demo: http://ideone.com/Zlhfua
Follow directions for changes:
while(i < SIZE_VECTOR_ONE && j < SIZE_VECTOR_TWO) { if (ID_doc_one < ID_doc_two) { ID_doc_one = v1[++i].first;
Let's say that "SIZE_VECTOR 1" is 1. This means that the vector has an element, an element [0]. If ID_doc_one is 0 and ID_doc_two is 1, then
if (0 < 1) { ID_doc_one = v1[1].first;
which is unacceptable. You might be better off using iterators or pointers:
while (oneIt != v1.end() && twoIt != v2.end()) { if (oneIt->first < twoIt->first) { ++oneIt; continue; } else if (*twoIt < *oneIt) { ++twoIt; continue; }
Further, this does not look right:
else { } // To avoid "out of range" errors <-- but also ends the "else" if (i < SIZE_VECTOR_ONE - 1) ID_doc_one = v1[++i].first; if (j < SIZE_VECTOR_TWO - 1) ID_doc_two = v2[++j].first; }
And I wonder what will happen if you have the same document but several positions?
This is the next nit-picky, but it took me a long time to parse
WordPosition_t pos_one = v1[i].second; WordPosition_t pos_two = v2[j].second;
it seems much clearer to write this, as you might say, "(if the second word is in position after the first word):
WordPosition_t posFirstWord = v1[i].second; WordPosition_t posSecondWord = v2[j].second;
This next part was a bit confusing, since both sentences seemed to be intended to increase i and j and update ID_doc_one and two, it would be advisable to raise this part to the general section after the if block, but again else {} it was hard to say that you actually doing.
if (pos_one + 1 == pos_two) { intersection.push_back(make_pair(ID_doc_one,pos_two)); ID_doc_one = v1[++i].first; ID_doc_two = v2[++j].first; } else { } // To avoid "out of range" errors if (i < SIZE_VECTOR_ONE - 1) ID_doc_one = v1[++i].first; if (j < SIZE_VECTOR_TWO - 1) ID_doc_two = v2[++j].first; }
When you match both arrays, you always want to increment both i and j, this is not a condition, I'm also not sure why you are using pos_two since the phrase was actually found in pos_one?
Here is how I would write it:
#include<iostream>
Real-time example: http://ideone.com/XRfhAI