Definition of the "mood" of text phrases through lexical analysis

I want to apply ratings (positive, negative or neutral) to short phrases of the text. If you do not parse the emoticons and make assumptions based on their use, I'm not sure what else to try. Can anyone provide examples, scientific articles, articles, etc., which take on a more lexical analysis of this problem.

I think that things like using an adverb, misuse / repetition of punctuation, spelling / grammar errors can be worthy indicators of an authorโ€™s mood in almost binary sense (good or bad).

+6
text parsing lexer
source share
3 answers

This looks like a fairly clear binary classification problem, where you can simplify the problem to positive or negative, and then make the most entropy decisions or those that have not reached the threshold of certainty, using the probability of mass set to neutral,

Your biggest obstacle will be getting learning data for the stochastic machine learning method. You can easily do this with the easily accessible maximum entropy model, such as the Advanced Discriminant Modeling Toolkit or Mallet . The described functions simply have to be formatted to the inputs used by these models.

To get training data, you can do some kind of paid crowdsourcing, such as Amazon Mechanical Turk, or just do it yourself, maybe with the help of a friend. For this you need a lot of data. You can improve the predictive power of your model in the light of lack of data using approaches such as active learning, ensemble or amplification, but it is important to test them as best as possible against real data and choose the best results for practical use.

If you are looking for documents for this, you need to take a look at the term โ€œmood analysisโ€ in Google Scholar. The Association for Computational Linguistics has many free and useful articles from conferences and journals that consider the problem from both a linguistic and algorithmic point of view. I also browse their archives. Good luck

+3
source share

Well, latent semantic analysis (there is paper ) seems like the closest well-established field of research that you are talking about. It is less value-oriented and more focused on larger documents, but may still have something to do with your problem.

+2
source share

It sounds like a really interesting idea - I would be interested to know what came of it.

I would say that punctuation is one indicator that you could use ...

  • ? - question
  • !?!? (or some option) Disbelief
  • ! with phrases like silly, idiotic, etc. - anger
  • ... - Stress, sarcasm

You can also try and pick up common abbreviations such as ...

  • LOL - Laughter (positive)
  • WTF, OMG - Disbelief, shocking
  • IMO - Thinking, Explaining

This is clearly a difficult thing you want to do, but it sounds very interesting.

0
source share

All Articles