I have this data:
print training_data print labels # prints [[1, 0, 1, 1], [1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 1, 0], [1, 1, 0, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 0,0], [1, 1, 1, 1], [1, 0, 1, 1]] ['a', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'b']
And I'm trying to pass it to RandomForestClassifier from the python sklearn library.
classifier = RandomForestClassifier(n_estimators=10) classifier.fit(training_data, labels)
But get this error:
Traceback (most recent call last): File "learn.py", line 52, in <module> main() File "learn.py", line 48, in main classifier = train_classifier() File "learn.py", line 33, in train_classifier classifier.fit(training_data, labels) File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 348, in fit y = np.ascontiguousarray(y, dtype=DOUBLE) File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_bbcfcf6_20130307-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 419, in ascontiguousarray return array(a, dtype, copy=False, order='C', ndmin=1) ValueError: could not convert string to float: a
I assume that I am not correctly formatting this data for installation. But I do not understand why from the documentation
This seems like a pretty simple, simple question. Does anyone know the answer?