This is the basis of a natural language processing area called mood analysis . Although your question is general, this is certainly not stupid - this kind of research is conducted by Amazon in text in product reviews, for example.
If you are serious about this, then a simple version can be achieved with
Get a body of positive / negative emotions . If this was a professional project, you can spend some time manually annotating the case yourself, but if you were in a hurry or just wanted to experiment first, I would suggest looking at the block of polarity of feelings from the studies of Bo Pang and Lilian Lee. The problem with using this enclosure is that it is not intended for your domain (in particular, the enclosure uses movie reviews), but it should still be applicable.
Separate your dataset in both Positive and Negative sentences . For a body of polarity of feelings, you can divide each review into its composite sentences, and then apply the general mood polarity tag (positive or negative) to all these sentences. Divide this case into two parts - 90% should be intended for training, 10% - for testing. If you use Weka, then it can handle the splitting of the case for you.
Apply machine learning algorithm (e.g., SVM, Naive Bayes, Maximum Entropy) to the word-level learning corps. This model is called the word model bag , which simply represents the sentence as the words that it consists of. This is the same model that many spam filters work on. To familiarize yourself with machine learning algorithms, there is an application called Weka that implements a number of these algorithms and gives you a graphical interface for playing them. Then you can check the performance of the model studied by the machine for errors made when trying to classify your test case using this model.
Apply this computer training algorithm to your user messages . For each user post, separate the post in the sentences and then classify them using the model learned by the machine.
So yes, if you are serious about this, then it is achievable - even without past experience in computational linguistics. It will be quite a lot of work, but even with the use of word-based models, good results can be achieved.
If you need more help, feel free to contact me - I am always happy to help others interested in NLP =]
Small notes -
- Simply dividing a text segment into sentences is an NLP field called the definition of a sentence boundary . There are many tools, OSS or free, available for this, but for your task a simple split on spaces and punctuation should be fine.
- SVMlight is also another student involved in the study, and in fact their inductive SVM performs a similar task with what we are considering - trying to classify Reuter articles on "corporate acquisitions" with 1000 positive and 1000 negative examples.
- Turning sentences into functions for classification can take some work. In this model, each word is a feature - this requires tokenization of the sentence, which means the separation of words and punctuation marks from each other. Another tip is to write down all the individual tokens of the word so that “I HATE YOU” and “I hate YOU,” both of which are considered the same. With a lot of data, you can try and also indicate whether capitalization helps in classifying someone who is angry, but I believe that words should be enough, at least for the initial effort.
Edit
I just opened LingPipe, which actually has a sentimentality analysis tutorial using the body polarity of Bo Pang and Lillian Lee that I talked about. If you use Java, which can be a great tool to use, and even if it does not go through all the steps described above.
Smerity Jun 06 '09 at 7:12 2009-06-06 07:12
source share