Fool the entire DataFrame (all columns) with Scikit-learn (sklearn) without iterating over the columns

Question

Fool the entire DataFrame (all columns) with Scikit-learn (sklearn) without iterating over the columns

I want to use all the columns in a pandas DataFrame ... The only way I can do this is column by column as shown below ...

Is there any operation in which I can impose an entire DataFrame without iterating through the columns?

#!/usr/bin/python from sklearn.preprocessing import Imputer import numpy as np import pandas as pd #Imputer fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=1) #Model 1 DF = pd.DataFrame([[0,1,np.nan],[2,np.nan,3],[np.nan,2,5]]) DF.columns = "c1.c2.c3".split(".") DF.index = "i1.i2.i3".split(".") #Impute Series imputed_DF = DF for col in DF.columns: imputed_column = fill_NaN.fit_transform(DF[col]).T #Fill in Series on DataFrame imputed_DF[col] = imputed_column #DF #c1 c2 c3 #i1 0 1 NaN #i2 2 NaN 3 #i3 NaN 2 5 #imputed_DF #c1 c2 c3 #i1 0 1.0 4 #i2 2 1.5 3 #i3 1 2.0 5

+14

python scikit-learn machine-learning dataframe

O.rka Nov 11 '15 at 10:12

source share

3 answers

If for some reason you do not need to specifically use Imple Imputer , it seems to me that a simpler option is to simply do:

 df = df.fillna(df.mean())

+3

Biggus Apr 10 '18 at 9:13

source share

Can I impute multiple columns?

0

Prateek ramsinghani May 08 '19 at 17:39

source share

O.rka · Accepted Answer · 2015-11-11T22:26:51+0000

If you want mean or median , you can do something like:

 fill_NaN = Imputer(missing_values=np.nan, strategy='mean', axis=1) imputed_DF = pd.DataFrame(fill_NaN.fit_transform(DF)) imputed_DF.columns = DF.columns imputed_DF.index = DF.index

If you want to fill them with 0 or something you can always do:

 DF[DF.isnull()] = 0

Fool the entire DataFrame (all columns) with Scikit-learn (sklearn) without iterating over the columns

More articles: