Let's say I have a set of keywords in the array {"olympic games", "best sports tennis", "tennis", "tennis rules")
Then I have a large list (up to 50 pieces) of lines (or actually tweets), so they are no more than 140 characters.
I want to look at each line and see what keywords are there. In the case where a keyword consists of several words, such as "best sports tennis", the words do not have to be together in a line, but they all should appear.
I am having trouble finding an algorithm that does this efficiently.
Do you have any suggestions for this? Thank!
Edit: To explain a little better, each keyword has an identifier associated with it, so {1: "olympics", 2: "best sports tennis", 3: "tennis", 4: "tennis rules"}
I want to view a list of lines / tweets and see which group of keywords matches. The result should be, this tweet belongs to keyword # 4. (several matches can be made, so that everything that matches keyword 2 will also match 3 - since they both contain tennis).
When a keyword has multiple words, for example. "best sports tennis," they should not appear together, but should appear. for example, it will correspond correctly: “I just played tennis, I like sports, its best” ... since this line contains “best sports tennis”, it will correspond and be associated with the keyword ID (which for this example is 2 )
Change 2: case insensitive.