Twitter sentiment analysis in Python

I am looking for an open source implementation, preferably in python, Textual Sentiment Analysis ( http://en.wikipedia.org/wiki/Sentiment_analysis ). Is anyone familiar with such an open source version that I can use?

I am writing an application that searches for twitter for some kind of search query, say, “youtube”, and considers tweets “happy” against “sad” tweets. I am using google appengine so this is in python. I would like to be able to classify returned search results from Twitter, and I would like to do this in python. So far, I have not been able to find such a sensing analyzer, especially not in python. Are you familiar with such an open source version that I can use? Preferably, this is already in python, but if not, I hope I can translate it into python.

Note that the texts I analyze are very short, they are tweets. Therefore, ideally, this classifier is optimized for such short texts.

By the way, twitter supports the operators ":)" and ":(" in the search, which seek to do just that, but, unfortunately, the classification provided by them is not so great, so I decided that I can give it a try.

Thank!

By the way, the early demo is here and the code that I still have is here , and I would like to open it with any interested developer.

+86
python open-source machine-learning nlp sentiment-analysis
Feb 21 '09 at 21:20
source share
12 answers

In most of these kinds of applications, you will have to flip most of your own code for the statistical classification task. As Lucka suggested, NLTK is an ideal tool for naturally manipulating a language in Python if your goal does not interfere with the non-commercial nature of its license. However, I would suggest other modeling software packages. I have not found many powerful machine learning models available for Python, so I am going to offer some standalone binaries that work easily with it.

You might be interested in the Advanced Discriminant Modeling Toolkit , which can be easily linked to Python. It has been used for classification tasks in various areas of natural language processing. You also have a choice of several models. I would suggest starting with the Maximum Entropy classification if you are already familiar with the implementation of the Naive Bayes classifier. If not, you might want to study it and encode one to really get a decent understanding of statistical classification as a machine learning task.

The University of Texas at the Austin Computational Linguistics Teams conducted classes in which most of the projects coming out of them used this great tool. You can look at the Computingational Linguistics II course page to get an idea of ​​how to make it work and what previous applications it served.

Another great tool that works the same way is Mallet . The difference between Mallet is that there is a bit more documentation and some more models available, such as decision trees, and this is in Java, which in my opinion makes it a little slower. Weka is a whole set of different machine learning models in one large package that includes some graphic materials, but in reality it is mainly intended for pedagogical purposes, and in fact this is not what I would put into production.

Good luck with your task. The real hard part is likely to be the amount of knowledge you need to classify the “seed set" from which your model will be learned. It should be quite significant, depending on whether you are performing a binary classification (happy or sad) or a whole series of emotions (which will require even more). Be sure to do some of this data for testing or do a few dozen or remote tests to make sure that you are actually doing a good job predicting before you expose it. And most importantly, have fun! This, in my opinion, is the best part of NLP and AI.

+42
Feb 22 '09 at 0:26
source share

Good luck with that.

The mood is extremely contextual, and a tweeting culture makes the problem worse because you are not given context for most tweets. The whole point of Twitter is that you can use a huge amount of the general context of the “real world” to pack a meaningful connection in a very short message.

If they say the video is bad, does it mean bad or bad?

A professor of linguistics lectured to her class one day. “In English,” she said: “Double negative forms positive. In some languages, however, such as Russian, double negative is still negative. However, there is no language in which double positive form negative.”

A voice from the back of the room, "Yes, right ..."

+75
Mar 03 '09 at 19:54
source share

Thank you all for your suggestions, they were really very helpful! In the end, I used the naive Bayesian classifier, which I borrowed from here . I started by submitting a list of good / bad keywords, and then added the “learn” function using user reviews. It turned out to be very enjoyable.

Full details of my work, as in the message.

Again, your help was very helpful, so thanks!

+18
Mar 19 '09 at 13:04
source share

I built a list of words with the inscription. You can access it here:

http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/6010/zip/imm6010.zip

On my blog you will find a short Python program:

http://finnaarupnielsen.wordpress.com/2011/06/20/simplest-sentiment-analysis-in-python-with-af/

This post shows how to use the word list with individual sentences, as well as with Twitter.

Approaches to word lists have their limitations. You will find a study of the limitations of my word list in the article “New ANEW: Evaluating a Word List for Analyzing Moods on Microblogs”. This article is available on my homepage.

Please note that the code is missing unicode(s, 'utf-8') (for reasons related to pedagogy).

+13
Jul 18 2018-11-18T00:
source share

Many research papers show that looking for adjectives, such as positive adjectives or negative adjectives, is a good starting point for mood analysis. For a short block of text, this is pretty much your only option ... There are documents that look at whole documents or offer level analysis, but, as you say, tweets are quite short ... There is no real magical approach to understanding the sentence mood, so I I think your best bet would be to search for one of these research articles and try to get their dataset of positive / negative oriented adjectives.

Now that this has been said, the mood is domain-specific, and it may be difficult for you to get a high level of accuracy with a general-purpose dataset.

Good luck.

+9
Feb 21 '09 at 23:04
source share

I think it will be difficult for you to find what you need. The closest I know about is LingPipe , which has mood analysis features and is available under a limited type open source license, but written in Java.

In addition, sensory analysis systems are usually developed by teaching the system data on viewing a product / film, which differs significantly from the average tweet. They will be optimized for text with multiple sentences, all about the same topic. I suspect that you better come up with a rule-based system, perhaps based on a vocabulary of sentimental terms, such as the one that the University of Pittsburgh provides .

Check out We feel great for implementing a similar idea with a really nice interface (and twitrratr ).

+4
Feb 21 '09 at 10:50
source share

Check out Twitter's mood analysis tool . It is written in python and uses the Naive Bayes classifier with semi-observance of machine learning. The source can be found here .

+2
Jul 13 '11 at 9:23
source share

I recently met the Natural Language Toolkit . You could probably use it as a starting point. It also has many modules and add-ons, so maybe they already have something similar.

+1
Feb 21 '09 at 21:53
source share

Maybe TextBlob (based on NLTK and the template) is a mood analysis tool for you.

+1
Aug 13 '14 at 7:59
source share

Somewhat strange thought: you can try using the Twitter API to upload a large set of tweets, and then classify a subset of this set using emoticons: one positive group for ":)", ":]", ": D", etc., and another negative group with ":(", etc.

Once you have such a rough classification, you can look for additional tips with frequency analysis or ngram or something in that direction.

This may seem silly, but serious research has been done on this (search for "mood analysis" and emoticon). Worth a look.

0
Mar 16 '09 at 6:22
source share

TwitterFeel has a Twitter Sentiment interface that promotes linguistic analysis of tweets and can receive positive / negative tweets. See http://www.webservius.com/corp/docs/tweetfeel_sentiment.htm

0
Mar 13 '10 at 2:07
source share

For those who are interested in coding Twitter Sentiment Analyis from scratch, there is a Coursera " " Data Science "course with Python code on GitHub (as part of assignment 1 - link ). Sentiment is part of AFINN-111 .

Here you can find working solutions, for example here . In addition to the AFINN-111 mood list, there is a simple implementation of creating a dynamic term list based on the frequency of terms in tweets having a pos / neg rating (see here ).

0
Mar 17 '14 at 11:12
source share



All Articles