So, I'm trying to classify text multiclasses. I read a lot of old questions and blog posts, but I still can't fully understand the concept of this.
I also tried an example from this blog post. http://www.laurentluce.com/posts/twitter-sentiment-analysis-using-python-and-nltk/
But when it comes to classifying multiclasses, I donβt quite understand this. Let's say I want to classify the text into several languages, French, English, Italian and German. And I want to use NaviesBayes, which, in my opinion, will be the easiest to start. From what I read in old questions, the easiest solution would be to use one against all. Thus, each language will have its own model. So, I would have 3 models for French, English and Italian. Then I will run the text against each model and check if it has the highest probability. Am I right?
But when it comes to coding, in the example above, it has tweets that will be classified as positive or negative.
pos_tweets = [('I love this car', 'positive'), ('This view is amazing', 'positive'), ('I feel great this morning', 'positive'), ('I am so excited about tonight\ concert', 'positive'), ('He is my best friend', 'positive')] neg_tweets = [('I do not like this car', 'negative'), ('This view is horrible', 'negative'), ('I feel tired this morning', 'negative'), ('I am not looking forward to tonight\ concert', 'negative'), ('He is my enemy', 'negative')]
That it is positive or negative. So, when you need to prepare one model for the French language, how do I tag text? Would that be so? So it will be positive?
[('Bon jour', 'French'), 'je m'appelle', 'French']
And it will be negative
[('Hello', 'English'), ('My name', 'English')]
But would this mean that I could just add Italian and German and have only one model for 4 languages? Or do I really not need a minus?
So the question is, what is the correct approach for classifying multiple classes using ntlk?