How to remove columns that have the same values in all rows through pandas or intrinsically safe frame?

Question

How to remove columns that have the same values in all rows through pandas or intrinsically safe frame?

Suppose I have data similar to the following:

index id name value value2 value3 data1 val5 0 345 name1 1 99 23 3 66 1 12 name2 1 99 23 2 66 5 2 name6 1 99 23 7 66

How can we discard all those columns like ( value , value2 , value3 ), where all rows have the same values in one command or in multiple commands using python ?

Consider that we have many columns similar to value , value2 , value3 ... value200 .

Output:

  index id name data1 0 345 name1 3 1 12 name2 2 5 2 name6 7

+7

python pandas duplicates multiple-columns spark-dataframe

CYAN CEVI Sep 23 '16 at 10:30

source share

2 answers

Another set_index solution set_index from a column that doesn't compare and then compare the first row selected by iloc using eq with all DataFrame and the last using boolean indexing :

 df1 = df.set_index(['index','id','name',]) print (~df1.eq(df1.iloc[0]).all()) value False value2 False value3 False data1 True val5 False dtype: bool print (df1.ix[:, (~df1.eq(df1.iloc[0]).all())].reset_index()) index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7

+3

jezrael Sep 23 '16 at 10:45

source share

Edchum · Accepted Answer · 2016-09-23T10:35:00+0000

What we can do is apply nunique to calculate the number of unique values in df and remove columns that have only one unique value:

 In [285]: cols = list(df) nunique = df.apply(pd.Series.nunique) cols_to_drop = nunique[nunique == 1].index df.drop(cols_to_drop, axis=1) Out[285]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7

Another way is to simply diff numeric columns and sums them:

 In [298]: cols = df.select_dtypes([np.number]).columns diff = df[cols].diff().sum() df.drop(diff[diff== 0].index, axis=1) Out[298]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7

Another approach is to use a property that the standard deviation will be zero for a column with the same value:

 In [300]: cols = df.select_dtypes([np.number]).columns std = df[cols].std() cols_to_drop = std[std==0].index df.drop(cols_to_drop, axis=1) Out[300]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7

Actually, this can be done in a single-line layer:

 In [306]: df.drop(df.std()[(df.std() == 0)].index, axis=1) Out[306]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7

How to remove columns that have the same values ​​in all rows through pandas or intrinsically safe frame?

More articles:

How to remove columns that have the same values in all rows through pandas or intrinsically safe frame?