Python - Drop row if two columns - NaN

This is an extension for this question , where the OP wanted to know how to delete rows, where the values ​​in the same column are NaN.

I am wondering how I can drop rows where the values ​​in columns are 2 (or more) and NaN. Using the second answer created by the Data Frame:

In [1]: df = pd.DataFrame(np.random.randn(10,3)) In [2]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan; In [3]: df Out[3]: 0 1 2 0 NaN NaN NaN 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 3 0.672201 0.964789 NaN 4 NaN NaN 0.050742 5 -1.250970 0.030561 -2.678622 6 NaN 1.036043 NaN 7 0.049896 -0.308003 0.823295 8 NaN NaN 0.637482 9 -0.310130 0.078891 NaN 

If I use the drop.na() command, in particular drop.na(subset=[1,2]) , then it completes the "or" type of drop and leaves:

 In[4]: df.dropna(subset=[1,2]) Out[4]: 0 1 2 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 5 -1.250970 0.030561 -2.678622 7 0.049896 -0.308003 0.823295 

What I want is the "and" type of drop, where it drops lines where there is NaN in the column index 1 and 2. This would leave:

  0 1 2 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 3 0.672201 0.964789 NaN 4 NaN NaN 0.050742 5 -1.250970 0.030561 -2.678622 6 NaN 1.036043 NaN 7 0.049896 -0.308003 0.823295 8 NaN NaN 0.637482 9 -0.310130 0.078891 NaN 

where only the first line is deleted.

Any ideas?

EDIT: Modified data frame values ​​for consistency

+11
source share
3 answers

Any one of the following two:

 df.dropna(subset=[1, 2], how='all') 

or

 df.dropna(subset=[1, 2], thresh=1) 
+13
source

Specify the dropna() method:

 df.dropna(subset=[1,2], how='all') 
+4
source

'' 'NOW THIS CODE COMPARISONS TWO OF MY COLUMNS, AND THERE IS 6 COMMENTARY OF COLUMNS THAT I HAVE 4 COLUMNS. NOW I TRY TO GIVE MORE THAN TWO COLUMNS IN MY DROPNA STATEMENT, BUT I ARE SHOULDING SHOWER 1 DO NOT DO MORE THAN ONE AS THIS REMOVES A LINE ONLY IF ALL NAN IS, as in the case of "How = all."

THIS ANSWER WAS TAKE LINE EXACTLY 2 NAN. '' '

 import pandas as pd import numpy as np import re a = np.random.randint(0,10,(30,4)) b = pd.DataFrame(a,columns = ['aa','bb','cc','dd']) c = b.sample(6) c.aa = 311 c.bb = 311 b.update(c) d = b.sample(2) d.cc = 311 d.dd = 311 b.update(d) b = b.replace(311, np.nan) print('*'*30) print('ORIGNAL DATA FRAME IS : ') print('*'*30) print(b) b.to_csv('C:\\Users\\HP\\Desktop\\CSV\\ORIGNAL_DATA.csv') #print(b) b = b.dropna(subset = ['aa','bb'],thresh = 1) b = b.dropna(subset = ['aa','cc'],thresh = 1) b = b.dropna(subset = ['aa','dd'],thresh = 1) b = b.dropna(subset = ['bb','cc'],thresh = 1) b = b.dropna(subset = ['bb','dd'],thresh = 1) b = b.dropna(subset = ['cc','dd'],thresh = 1) print('*'*30) print('REQUIRED DATA FRAME IS : ') print('*'*30) print(b) print('*'*30) print(b.count()) b.to_csv('C:\\Users\\HP\\Desktop\\CSV\\MANIPULATED_DATA.csv') 
0
source

All Articles