From Google Analytics, I have a (long) list of keywords that people used in search engines to find my site. I want to find the "keywords", a hypothetical example:
java online training
learning java
scala training
training for java
online training java
learn scala programming
Ideal result: "java", "online training", "training", "scala" and "training".
The difficulty, apparently, lies in finding complete phrases, ignoring common words (for) and processing options (training).
Is there a library that can do this (preferably for the JVM)? Or is there a suitable algorithm that I can implement myself?
source
share