Why testing NaN == NaN does not work to remove from pandas dataFrame?

Please explain how NaN is handled in pandas, because the following logic seems โ€œbroken" to me, I tried various methods (shown below) to discard empty values.

My data frame, which I load from a CSV file using read.csv , has a comments column, which in most cases is empty.

The marked_results.comments column is as follows; everything else in the column is NaN, so pandas loads empty entries as NaN, so far so good:

 0 VP 1 VP 2 VP 3 TEST 4 NaN 5 NaN .... 

Now I'm trying to delete these entries, only this works:

  • marked_results.comments.isnull()

All this does not work:

  • marked_results.comments.dropna() gives only the same column, nothing is reset, it is confusing.
  • marked_results.comments == NaN gives only a series of all False s. Nothing was NaNs ... confusing.
  • also marked_results.comments == nan

I also tried:

 comments_values = marked_results.comments.unique() array(['VP', 'TEST', nan], dtype=object) # Ah, gotya! so now ive tried: marked_results.comments == comments_values[2] # but still all the results are Falses!!! 
+7
python pandas nan dataframe
source share
2 answers

You should use isnull and notnull to check for NaN (they are more reliable using pandas dtypes than numpy), see "values โ€‹โ€‹considered missing" in the docs .

Using the Series dropna method in a column will not affect the original data framework, but it will do what you want:

 In [11]: df Out[11]: comments 0 VP 1 VP 2 VP 3 TEST 4 NaN 5 NaN In [12]: df.comments.dropna() Out[12]: 0 VP 1 VP 2 VP 3 TEST Name: comments, dtype: object 

dropna The dropna method has a subset argument (to remove rows containing NaN in specific columns):

 In [13]: df.dropna(subset=['comments']) Out[13]: comments 0 VP 1 VP 2 VP 3 TEST In [14]: df = df.dropna(subset=['comments']) 
+15
source share

You need to test NaN using the math.isnan() (or numpy.isnan ) numpy.isnan . NaNs cannot be verified using the equality operator.

 >>> a = float('NaN') >>> a nan >>> a == 'NaN' False >>> isnan(a) True >>> a == float('NaN') False 

Help function โ†’

 isnan(...) isnan(x) -> bool Check if float x is not a number (NaN). 
+7
source share

All Articles