Why does collection.Counter treat numpy.nan as equal?

I am surprised by the following behavior:

>>> import numpy as np >>> from collections import Counter >>> my_list = [1,2,2, np.nan, np.nan] >>> Counter(my_list) Counter({nan: 2, 2: 2, 1: 1}) # Counter treats np.nan as equal and # tells me that I have two of them >>> np.nan == np.nan # However, np.nan are not equal False 

What's going on here?

When I use float('nan') instead of np.nan , I get the expected behavior:

 >>> my_list = [1,2,2, float('nan'), float('nan')] >>> Counter(my_list) Counter({2: 2, nan: 1, 1: 1, nan: 1}) # two different nan's >>> float('nan') == float('nan') False 

I am using python 2.7.3 and numpy 1.8.1 .

Edit:

If I do this:

 >>> a = 300 >>> b = 300 >>> a is b False >>> Counter([a, b]) Counter({300: 2}) 

So, Counter or any python dict considers two objects X and Y not the same if:

 X == Y -> False and X is Y -> False 

right?

+6
source share
2 answers

This is not about numpy.nan vs. float("nan") , this means that you have two separate floating elements.

 >>> np.nan is np.nan True >>> float("nan") is float("nan") False 

and therefore

 >>> Counter([1,2,2, np.nan, np.nan]) Counter({nan: 2, 2: 2, 1: 1}) >>> Counter([1,2,2, float("nan"), float("nan")]) Counter({2: 2, nan: 1, 1: 1, nan: 1}) 

but

 >>> f = float("nan") >>> Counter([1,2,2, f, f]) Counter({nan: 2, 2: 2, 1: 1}) 
+7
source

Python dicts (and, in addition, the Counter subclass) usually work based on equality == key comparisons. BUT they do an optimization assuming that if x is y , then x == y . Only if x is not y , will the dictate return to equality comparison. For most types, x is y implies x == y . These are basically just floating point NaNs and intentionally made-up counterexamples that violate this condition.

+4
source

All Articles