Scikit learn - How to use SVM and Random Forest to classify text?

I have a set trainFeaturesand a set testFeatureswith positive, neutral and negative labels:

trainFeats = negFeats + posFeats + neutralFeats
testFeats  = negFeats + posFeats + neutralFeats

For example, one entry inside trainFeatsis

(['blue', 'yellow', 'green'], 'POSITIVE') 

the same for the list of test functions, so I specify the labels for each set. My question is, how can I use the scikit implementation of the Random Forest classifier and SVM to get the accuracy of this classifier as a whole with accuracy and feedback for each class? The problem is that I am currently using words as functions, and from what I read, these classifiers require numbers. Can I achieve my goal without changing functionality? Many thanks!

+4
source share
1 answer

You can study the scikit-learn tutorial and especially the section on training and forecasting on how to create and use a classifier. The example uses SVM, but it is easier to use RandomForestClassifier , since all classifiers implement the fitand methods predict.

When working with text functions, you can use CountVectorizer or DictVectorizer . Take a look at the feature extraction and especially section 4.1.3 ,

.

.

+9

All Articles