Scikit learn - How to use SVM and Random Forest to classify text?

Question

Scikit learn - How to use SVM and Random Forest to classify text?

I have a set trainFeaturesand a set testFeatureswith positive, neutral and negative labels:

trainFeats = negFeats + posFeats + neutralFeats
testFeats  = negFeats + posFeats + neutralFeats

For example, one entry inside trainFeatsis

(['blue', 'yellow', 'green'], 'POSITIVE')

the same for the list of test functions, so I specify the labels for each set. My question is, how can I use the scikit implementation of the Random Forest classifier and SVM to get the accuracy of this classifier as a whole with accuracy and feedback for each class? The problem is that I am currently using words as functions, and from what I read, these classifiers require numbers. Can I achieve my goal without changing functionality? Many thanks!

+4

python scikit-learn machine-learning classification

Crista23 Feb 23 '14 at 20:00

source share

1 answer

dnll · Accepted Answer · 2014-02-23T23:23:44+0000

You can study the scikit-learn tutorial and especially the section on training and forecasting on how to create and use a classifier. The example uses SVM, but it is easier to use RandomForestClassifier , since all classifiers implement the fitand methods predict.

When working with text functions, you can use CountVectorizer or DictVectorizer . Take a look at the feature extraction and especially section 4.1.3 ,

.

Scikit learn - How to use SVM and Random Forest to classify text?

More articles: