Sklearn: how to feed data to sklearn RandomForestClassifier

Question

Sklearn: how to feed data to sklearn RandomForestClassifier

I have this data:

print training_data print labels # prints [[1, 0, 1, 1], [1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 1, 0], [1, 1, 0, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 0,0], [1, 1, 1, 1], [1, 0, 1, 1]] ['a', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'b']

And I'm trying to pass it to RandomForestClassifier from the python sklearn library.

 classifier = RandomForestClassifier(n_estimators=10) classifier.fit(training_data, labels)

But get this error:

 Traceback (most recent call last): File "learn.py", line 52, in <module> main() File "learn.py", line 48, in main classifier = train_classifier() File "learn.py", line 33, in train_classifier classifier.fit(training_data, labels) File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 348, in fit y = np.ascontiguousarray(y, dtype=DOUBLE) File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 419, in ascontiguousarray return array(a, dtype, copy=False, order='C', ndmin=1) ValueError: could not convert string to float: a

I assume that I am not correctly formatting this data for installation. But I do not understand why from the documentation

This seems like a pretty simple, simple question. Does anyone know the answer?

+4

python scikit-learn random-forest

David Williams Apr 7 '13 at 19:31

source share

2 answers

Matt · Answer 1 · 2013-04-07T19:44:47+0000

Try changing your tags in advance with LabelEncoder .

user2750362 · Answer 2 · 2015-05-27T16:01:30+0000

You can use numpy arrays that are automatically recognized by the classifier, as shown below:

 import numpy as np from sklearn.ensemble import RandomForestClassifier np_training = np.array(training_data) np_labels = np.array(labels) clf = RandomForestClassifier(n_estimators=20, max_depth=5) clf.fit(np_training, np_labels)

This should work

Sklearn: how to feed data to sklearn RandomForestClassifier

More articles: