Predict missing values with scikit-learn Imputer module

Question

Predict missing values with scikit-learn Imputer module

I am writing a very basic program for predicting missing values in a dataset using the scikit-learn Imputer class .

I created a NumPy array, created an Imputer object with the strategy = 'mean' and executed fit_transform () on the NumPy array.

When I print an array after doing fit_transform (), "Nan stays and I don't get any prediction."

What am I doing wrong here? How can I predict missing values?

import numpy as np from sklearn.preprocessing import Imputer X = np.array([[23.56],[53.45],['NaN'],[44.44],[77.78],['NaN'],[234.44],[11.33],[79.87]]) print X imp = Imputer(missing_values='NaN', strategy='mean', axis=0) imp.fit_transform(X) print X

+8

python numpy scikit-learn prediction

xennygrimmato Jul 29 '14 at 14:16

source share

1 answer

jonrsharpe · Accepted Answer · 2014-07-29T14:20:30+0000

Per documentation , sklearn.preprocessing.Imputer.fit_transform returns a new array, it does not change the array of arguments. Therefore, the minimum correction:

 X = imp.fit_transform(X)

Predict missing values ​​with scikit-learn Imputer module

More articles:

Predict missing values with scikit-learn Imputer module