Select rows where at least one value from the column list is not zero

Question

Select rows where at least one value from the column list is not zero

I have a large framework with lots of columns (e.g. 1000). I have a list of columns (generated script ~ 10). And I would like to highlight all the rows in the original data frame, where at least one of my columns is not null.

So, if I knew the number of my columns in advance, I could do something like this:

list_of_cols = ['col1', ...] df[ df[list_of_cols[0]].notnull() | df[list_of_cols[1]].notnull() | ... df[list_of_cols[6]].notnull() | ]

I can also iterate over the column list and create a mask that I would apply to df , but its appearance is too tedious. Knowing how powerful pandas is with respect to working with nan, I would expect that there is a way to simplify the path to what I want.

+5

python pandas

Salvador dali Aug 20 '16 at 4:36

source share

3 answers

Starting from this:

 data = {'a' : [np.nan,0,0,0,0,0,np.nan,0,0, 0,0,0, 9,9,], 'b' : [np.nan,np.nan,1,1,1,1,1,1,1, 2,2,2, 1,7], 'c' : [np.nan,np.nan,1,1,2,2,3,3,3, 1,1,1, 1,1], 'd' : [np.nan,np.nan,7,9,6,9,7,np.nan,6, 6,7,6, 9,6]} df = pd.DataFrame(data, columns=['a','b','c','d']) df abcd 0 NaN NaN NaN NaN 1 0.0 NaN NaN NaN 2 0.0 1.0 1.0 7.0 3 0.0 1.0 1.0 9.0 4 0.0 1.0 2.0 6.0 5 0.0 1.0 2.0 9.0 6 NaN 1.0 3.0 7.0 7 0.0 1.0 3.0 NaN 8 0.0 1.0 3.0 6.0 9 0.0 2.0 1.0 6.0 10 0.0 2.0 1.0 7.0 11 0.0 2.0 1.0 6.0 12 9.0 1.0 1.0 9.0 13 9.0 7.0 1.0 6.0

Strings where not all values are zeros. (Removing row index 0)

 df[~df.isnull().all(axis=1)] abcd 1 0.0 NaN NaN NaN 2 0.0 1.0 1.0 7.0 3 0.0 1.0 1.0 9.0 4 0.0 1.0 2.0 6.0 5 0.0 1.0 2.0 9.0 6 NaN 1.0 3.0 7.0 7 0.0 1.0 3.0 NaN 8 0.0 1.0 3.0 6.0 9 0.0 2.0 1.0 6.0 10 0.0 2.0 1.0 7.0 11 0.0 2.0 1.0 6.0 12 9.0 1.0 1.0 9.0 13 9.0 7.0 1.0 6.0

+1

Merlin Aug 20 '16 at 5:17

source share

You can use logical indexing

 df[~pd.isnull(df[list_of_cols]).all(axis=1)]

Explanation:

The expression df[list_of_cols]).all(axis=1) returns a logical array that is used as a filter for the data frame:

isnull() applied to df[list_of_cols] creates a logical mask for dataframe df[list_of_cols] with True values for null elements in df[list_of_cols] , False otherwise
all() returns True if all elements are True (row-wise axis=1 )

Thus, by negation, ~ (not all null = at least one is not equal to null) gets a mask for all rows that contain at least one non-zero element in this column list.

Example:

Dataframe:

 >>> df=pd.DataFrame({'A':[11,22,33,np.NaN], 'B':['x',np.NaN,np.NaN,'w'], 'C':['2016-03-13',np.NaN,'2016-03-14','2016-03-15']}) >>> df ABC 0 11 x 2016-03-13 1 22 NaN NaN 2 33 NaN 2016-03-14 3 NaN w 2016-03-15

Mask

isnull :

 >>> ~pd.isnull(df[list_of_cols]) BC 0 True True 1 False False 2 False True 3 True True

apply all(axis=1) line by line:

 >>> ~pd.isnull(df[list_of_cols]).all(axis=1) 0 True 1 False 2 True 3 True dtype: bool

Logical selection from the data frame:

 >>> df[~pd.isnull(df[list_of_cols]).all(axis=1)] ABC 0 11 x 2016-03-13 2 33 NaN 2016-03-14 3 NaN w 2016-03-15

0

user2314737 Dec 30 '16 at 19:46

source share

piRSquared · Accepted Answer · 2016-08-20T07:05:13+0000

Use the thresh parameter in the dropna() method. By setting thresh=1 , you indicate that if there is at least one nonzero element, do not discard it.

 df = pd.DataFrame(np.random.choice((1., np.nan), (1000, 1000), p=(.3, .7))) list_of_cols = list(range(10)) df[list_of_cols].dropna(thresh=1).head()

Select rows where at least one value from the column list is not zero

More articles: