How to classify multiclasses with NLTK?

Question

How to classify multiclasses with NLTK?

So, I'm trying to classify text multiclasses. I read a lot of old questions and blog posts, but I still can't fully understand the concept of this.

I also tried an example from this blog post. http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/

But when it comes to classifying multiclasses, I don’t quite understand this. Let's say I want to classify the text into several languages, French, English, Italian and German. And I want to use NaviesBayes, which, in my opinion, will be the easiest to start. From what I read in old questions, the easiest solution would be to use one against all. Thus, each language will have its own model. So, I would have 3 models for French, English and Italian. Then I will run the text against each model and check if it has the highest probability. Am I right?

But when it comes to coding, in the example above, it has tweets that will be classified as positive or negative.

pos_tweets = [('I love this car', 'positive'), ('This view is amazing', 'positive'), ('I feel great this morning', 'positive'), ('I am so excited about tonight\ concert', 'positive'), ('He is my best friend', 'positive')] neg_tweets = [('I do not like this car', 'negative'), ('This view is horrible', 'negative'), ('I feel tired this morning', 'negative'), ('I am not looking forward to tonight\ concert', 'negative'), ('He is my enemy', 'negative')]

That it is positive or negative. So, when you need to prepare one model for the French language, how do I tag text? Would that be so? So it will be positive?

 [('Bon jour', 'French'), 'je m'appelle', 'French']

And it will be negative

 [('Hello', 'English'), ('My name', 'English')]

But would this mean that I could just add Italian and German and have only one model for 4 languages? Or do I really not need a minus?

So the question is, what is the correct approach for classifying multiple classes using ntlk?

+6

python machine-learning nltk

toy Nov 23 '12 at 0:50

source share

2 answers

Classifiers in NLTK ( http://www.nltk.org/api/nltk.classify.html ) can have several options, and it is important to understand the subtle difference.

The simplest option is to distinguish between two categories, for example. positive and negative feeling, men and women. ( http://www.nltk.org/api/nltk.classify.html#module-nltk.classify.positivenaivebayes )

The second option is when you have several categories (two or more), for example. The text is in French, German, or English, and you assume that each text uses only one language. Note that the language in NLTK does not describe this as “multiclass”, which can be understandably misleading when you are new to this. Think of it this way. The classifier will not assign one text to several classes, for example. German and French, but only for one class.

Finally, there is a Multiclassifier that differs in that a given input can be assigned to more than one class, for example. 50% French and 50% German or 40% English, 30% German and 30% French.

0

Robert Jul 28 '16 at 9:06

source share

Fred foo · Accepted Answer · 2012-11-23T01:39:50+0000

There is no need for a one-vs-all scheme with Naive Bayes - it is a multi-class model out of the box. Just give a list of pairs (sample, label) classifier student, where label stands for language.

How to classify multiclasses with NLTK?

More articles: