Ah ... but "I really love dogs" and "I really hate dogs" are completely similar;), both discuss the same feelings for dogs. It seems that you are missing a step:
- Run your algorithm and get common groups of topics (i.e. "feelings for dogs").
- Run your algorithm again, but this time for each previously “discovered” group and let your algorithm further classify them into subgroups (ie “I hate dogs” / “I love dogs”).
If your algorithm is customizable based on its experience (i.e. some students participate there)., Then make sure that you run separate instances of the algorithm for the first classification and a new instance of the algorithm for each subclass. If you do not, you may encounter a situation where you find several groups, and at any time when you run your algorithm in the same groups, the results are almost identical and / or nothing has changed at all.
Update
Apache Mahout provides many useful algorithms and examples Clustering, classification, genetic programming, decision forests, recommendations. Here are some examples of text classification from mahout:
I'm not sure which one works best for your problem, but maybe if you look at them, you will understand which one is most suitable for your specific application.
Kiril
source share