How to remove columns that have the same values ​​in all rows through pandas or intrinsically safe frame?

Suppose I have data similar to the following:

index id name value value2 value3 data1 val5 0 345 name1 1 99 23 3 66 1 12 name2 1 99 23 2 66 5 2 name6 1 99 23 7 66 

How can we discard all those columns like ( value , value2 , value3 ), where all rows have the same values ​​in one command or in multiple commands using python ?

Consider that we have many columns similar to value , value2 , value3 ... value200 .

Output:

  index id name data1 0 345 name1 3 1 12 name2 2 5 2 name6 7 
+7
python pandas duplicates multiple-columns spark-dataframe
source share
2 answers

What we can do is apply nunique to calculate the number of unique values ​​in df and remove columns that have only one unique value:

 In [285]: cols = list(df) nunique = df.apply(pd.Series.nunique) cols_to_drop = nunique[nunique == 1].index df.drop(cols_to_drop, axis=1) Out[285]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7 

Another way is to simply diff numeric columns and sums them:

 In [298]: cols = df.select_dtypes([np.number]).columns diff = df[cols].diff().sum() df.drop(diff[diff== 0].index, axis=1)​ Out[298]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7 

Another approach is to use a property that the standard deviation will be zero for a column with the same value:

 In [300]: cols = df.select_dtypes([np.number]).columns std = df[cols].std() cols_to_drop = std[std==0].index df.drop(cols_to_drop, axis=1) Out[300]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7 

Actually, this can be done in a single-line layer:

 In [306]: df.drop(df.std()[(df.std() == 0)].index, axis=1) Out[306]: index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7 
+14
source share

Another set_index solution set_index from a column that doesn't compare and then compare the first row selected by iloc using eq with all DataFrame and the last using boolean indexing :

 df1 = df.set_index(['index','id','name',]) print (~df1.eq(df1.iloc[0]).all()) value False value2 False value3 False data1 True val5 False dtype: bool print (df1.ix[:, (~df1.eq(df1.iloc[0]).all())].reset_index()) index id name data1 0 0 345 name1 3 1 1 12 name2 2 2 5 2 name6 7 
+3
source share

All Articles