Why testing NaN == NaN does not work to remove from pandas dataFrame?

Question

Why testing NaN == NaN does not work to remove from pandas dataFrame?

Please explain how NaN is handled in pandas, because the following logic seems “broken" to me, I tried various methods (shown below) to discard empty values.

My data frame, which I load from a CSV file using read.csv , has a comments column, which in most cases is empty.

The marked_results.comments column is as follows; everything else in the column is NaN, so pandas loads empty entries as NaN, so far so good:

 0 VP 1 VP 2 VP 3 TEST 4 NaN 5 NaN ....

Now I'm trying to delete these entries, only this works:

marked_results.comments.isnull()

All this does not work:

marked_results.comments.dropna() gives only the same column, nothing is reset, it is confusing.
marked_results.comments == NaN gives only a series of all False s. Nothing was NaNs ... confusing.
also marked_results.comments == nan

I also tried:

 comments_values = marked_results.comments.unique() array(['VP', 'TEST', nan], dtype=object) # Ah, gotya! so now ive tried: marked_results.comments == comments_values[2] # but still all the results are Falses!!!

+7

python pandas nan dataframe

idoda Jul 31 '13 at 12:03

source share

2 answers

You need to test NaN using the math.isnan() (or numpy.isnan ) numpy.isnan . NaNs cannot be verified using the equality operator.

 >>> a = float('NaN') >>> a nan >>> a == 'NaN' False >>> isnan(a) True >>> a == float('NaN') False

Help function →

 isnan(...) isnan(x) -> bool Check if float x is not a number (NaN).

+7

Sukrit kalra Jul 31 '13 at 12:04

source share

Andy hayden · Accepted Answer · 2013-07-31T12:18:21+0000

You should use isnull and notnull to check for NaN (they are more reliable using pandas dtypes than numpy), see "values considered missing" in the docs .

Using the Series dropna method in a column will not affect the original data framework, but it will do what you want:

 In [11]: df Out[11]: comments 0 VP 1 VP 2 VP 3 TEST 4 NaN 5 NaN In [12]: df.comments.dropna() Out[12]: 0 VP 1 VP 2 VP 3 TEST Name: comments, dtype: object

dropna The dropna method has a subset argument (to remove rows containing NaN in specific columns):

 In [13]: df.dropna(subset=['comments']) Out[13]: comments 0 VP 1 VP 2 VP 3 TEST In [14]: df = df.dropna(subset=['comments'])

Why testing NaN == NaN does not work to remove from pandas dataFrame?

More articles: