Algorithm for determining how a positive or negative operator / text

I need an algorithm to determine if a sentence, paragraph or article is negative or positive in tone ... or even better, as negative or positive.

For example:

Jason is the worst SO user I've ever seen (-10)

Jason - User SO (0)

Jason is the best SO user I've ever seen (+10)

Jason best suck SO (-10)

So far, good at SO, Jason does worse than anything (+10)

Not easy, right? :)

I do not expect anyone to explain this algorithm to me, but I assume that to some extent there is already something similar in academic circles. If you can point me to some articles or studies, I would like it.

Thank.

+58
algorithm nlp
Nov 15 '08 at 20:14
source share
13 answers

There is a subfield for natural language processing called mood analysis , which deals specifically with this problem domain. Significant commercial work is being done in this area because consumer products are so heavily revised on online user forums (ugc or user-generated content). There is also a prototype text analytics platform called GATE from the University of Sheffield, and a python project called nltk . Both are considered flexible, but not very tall. One or the other may be useful for developing your own ideas.

+45
Nov 15 '08 at 20:50
source share

In my company, we have a product that does this and also works well. I did most of the work on this. I can give a brief introduction:

You need to divide the paragraph into sentences, and then divide each sentence into smaller sub-tasks - comma-based splitting, hyphen, semi-colon, colon, "and", "or", etc. In some cases, each supporting sentence will show completely different feelings.

Some sentences, even if separated, should be put together.

For example: the product is amazing, great and fantastic.

We have developed a comprehensive set of rules for types of sentences that need to be divided and which should not be (based on POS tag words)

At the first level, you can use the approach with a bag of words, that is, have a list of positive and negative words / phrases and check each subtable. At the same time, also look at the negative words such as “no”, “no”, etc., which will change the polarity of the sentence.

Even then, if you can’t find the mood, you can go for naive tales . This approach is not very accurate (about 60%). But if you apply this only to a sentence that does not go through the first set of rules, you can easily get an accuracy of 80-85%.

The important part is a list of positive / negative words and a way to separate things. If you want, you can even go one level higher by doing HMM (hidden Markov model) or CRF (conditional random fields). But I'm not a professional in NLP, and someone else can fill you in this part.

For curious people, we implemented all this python with NLTK and the Reverend Bayes module.

Pretty simple and handles most offers. However, you may encounter problems when trying to tag content from the Internet. Most people do not write the correct sentences on the Internet. It is also very difficult to cope with sarcasm.

+29
Dec 24 '08 at 13:41
source share

This falls under the umbrella of Natural Language Processing , and therefore reading that this is probably a good place to start.

If you don’t want to get into a very difficult problem, you can simply create lists of “positive” and “negative” words (and weigh them if you want), and do word counting over sections of text. Obviously, this is not a smart solution, but it gives you some information with very little work, where serious NLP will be very laborious.

One of your examples could potentially be positive when it was actually negative using this approach (“Jason is best sucked with SO”) if you fail to weigh “suck” more than the “best” .... But it’s also a small sample of text, if you look at paragraphs or more text, then the weighing becomes more reliable, if only you have someone purposefully trying to trick your algorithm.

+8
Nov 15 '08 at 20:21
source share

As indicated, this occurs under the analysis of feelings in the processing of natural language.
Afaik GATE does not have a component that analyzes moods.
In my experience, I applied an algorithm that is an adaptation to the one described in the article “Recognizing Contextual Polarity in Phrase Level Mood Analysis” by Teresa Wilson, Jean Viebe, Paul Hoffman as this GATE plugin, which gives good results. This can help you if you want to download.

+5
Nov 15 '08 at 21:26
source share

Depending on your application, you can do this using the Bayesian filtering algorithm (which is often used in spam filters).

One way to do this is to have two filters. One is for positive documents, the other is for negative documents. You would put a positive filter with positive documents (whatever criteria you use) and a negative filter with negative documents. The trick would be to find these documents. Perhaps you can set it up so that your users evaluate documents efficiently.

A positive filter (after sowing) will look for positive words. Perhaps this will end with words such as love, peace, etc. A negative filter will also be seeded.

Once your filters are configured, you run test text through them to get positive and negative ratings. Based on these metrics and some weighting, you can come up with your own digital score.

Bayesian filters, although simple, are surprisingly effective.

+5
Nov 15 '08 at 22:08
source share

You can do the following:

Jason is the worst SO user I have ever witnessed (-10) 

the worst is (-), the rest is (+). so it will be (-) + (+) = (-)

  Jason is an SO user (0) 

() + () = ()

  Jason is the best SO user I have ever seen (+10) 

best (+), the rest is (). so it will be (+) + () = (+)

  Jason is the best at sucking with SO (-10) 

best (+), sucking (-). so that (+) + (-) = (-)

  While, okay at SO, Jason is the worst at doing bad (+10) 

worst (-), bad (-). so that (-) + (-) = (+)

+3
Feb 21 '12 at 12:33
source share

There are many approaches to computer training for this kind of mood analysis. I used most of the machine learning algorithms that are already implemented. my case i used

weka classification algorithms

  • SVM
  • naive basic
  • J48

    Only you need to do this in order to train the model in your context, add a selected vector and configure the rule. In my case, I got some (61% accuracy). So, we go to stanford core nlp (they prepared their model for watching movies), and we used their training kit and added our training kit. we could achieve 80-90% accuracy.

+1
Dec 19 '14 at 4:59
source share

This is an old question, but I found that he was looking for a tool that could analyze the tone of the article and found IBM's Watson Tone Analyzer . It allows you to receive monthly calls for 1000 api monthly.

+1
Jul 14 '16 at 21:47
source share

It is all about context, I think. If you are looking for people who are best sucked with SO. Sucking the best can be positive. To determine what is good or bad, and how much I could recommend in Fuzzy Logic.

It is a bit like tall. Someone who is 1.95 m long can be considered tall. If you put this person in a group with people from only 2.10m, he looks short.

0
Nov 15 '08 at 20:29
source share

Maybe essay assessment software could be used to evaluate tone? WIRED .
Possible link. (I could not read it.)
This report compares writing skills with the Flesch-Kincaid level level required to read it!
Page 4 of e-rator says they look at abuse and the like. (Maybe a bad post is also spelled incorrectly)

Slashdot

You can also use some kind of email filter for negative, not spam.

0
Nov 15 '08 at 21:12
source share

How about sarcasm:

  • Jason is the best SO user I've ever seen, NOT
  • Jason is the best SO user I've ever seen, right
0
Dec 24 '08 at 13:01
source share

Ah, I remember one java library for this LingPipe (commercial license), which we appreciated. This will work well for the corpus example, which is available on the site, but for real data it sucks very poorly.

0
Dec 24 '08 at 13:43
source share
  use Algorithm::NaiveBayes; my $nb = Algorithm::NaiveBayes->new; $nb->add_instance (attributes => {foo => 1, bar => 1, baz => 3}, label => 'sports'); $nb->add_instance (attributes => {foo => 2, blurp => 1}, label => ['sports', 'finance']); ... repeat for several more instances, then: $nb->train; # Find results for unseen instances my $result = $nb->predict (attributes => {bar => 3, blurp => 2}); 
-four
Aug 22 2018-11-23T00:
source share



All Articles