TypeError: unsupported operand type for -: 'numpy.ndarray' and 'numpy.ndarray'

Question

TypeError: unsupported operand type for -: 'numpy.ndarray' and 'numpy.ndarray'

I am trying to calculate the rms prediction error y_train_actual from my sci-kit training model with initial salaries .

Problem: However, with mean_squared_error(y_train_actual, salaries) I get the error TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray' . Using list(salaries) instead of salaries , as the second parameter gives the same error.

With mean_squared_error(y_train_actual, y_valid_actual) I get the error Found array with dim 40663. Expected 244768

How can I convert to the correct array types for sklearn.netrucs.mean_squared_error() ?

code

 from sklearn.metrics import mean_squared_error y_train_actual = [ np.exp(float(row)) for row in y_train ] print mean_squared_error(y_train_actual, salaries)

Error

 TypeError Traceback (most recent call last) <ipython-input-144-b6d4557ba9c5> in <module>() 3 y_valid_actual = [ np.exp(float(row)) for row in y_valid ] 4 ----> 5 print mean_squared_error(y_train_actual, salaries) 6 print mean_squared_error(y_train_actual, y_valid_actual) C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 1462 """ 1463 y_true, y_pred = check_arrays(y_true, y_pred) -> 1464 return np.mean((y_pred - y_true) ** 2) 1465 1466 TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'

code

 y_train_actual = [ np.exp(float(row)) for row in y_train ] y_valid_actual = [ np.exp(float(row)) for row in y_valid ] print mean_squared_error(y_train_actual, y_valid_actual)

Error

 ValueError Traceback (most recent call last) <ipython-input-146-7fcd0367c6f1> in <module>() 4 5 #print mean_squared_error(y_train_actual, salaries) ----> 6 print mean_squared_error(y_train_actual, y_valid_actual) C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 1461 1462 """ -> 1463 y_true, y_pred = check_arrays(y_true, y_pred) 1464 return np.mean((y_pred - y_true) ** 2) 1465 C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options) 191 if size != n_samples: 192 raise ValueError("Found array with dim %d. Expected %d" --> 193 % (size, n_samples)) 194 195 if not allow_lists or hasattr(array, "shape"): ValueError: Found array with dim 40663. Expected 244768

code

 print type(y_train) print type(y_train_actual) print type(salaries)

Result

 <type 'list'> <type 'list'> <type 'tuple'>

print y_train [: 10]

[10.126631103850338, 10.308952660644293, 10.308952660644293, 10.221941283654663, 10.126631103850338, 10.126631103850338, 11.225243392518447, 9.9987977323404529, 10.043249494911286, 11.350406535472453]

print salaries [: 10]

('25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000')

print the list (salaries) [: 10]

['25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000']

print len (y_train)

seal len (salary)

+4

python python-2.7 numpy scipy scikit-learn

Nyxynyx May 02, '13 at 4:32

source share

1 answer

fgb · Accepted Answer · 2013-05-02T04:49:05+0000

The problem with TypeError is that salaries are a list of strings and y_train_actual is a list of floats. They cannot be deducted.

For your second error, you must make sure that both arrays are the same size, otherwise it will not be able to subtract them.

TypeError: unsupported operand type for -: 'numpy.ndarray' and 'numpy.ndarray'

More articles: