I have a simple test case of a function that returns a df that could potentially contain NaN. I tested if the weekend and expected results were equal.
>>> output Out[1]: rt ts tt ttct 0 2048 30 0 90 1 1 4096 90 1 30 1 2 0 70 2 65 1 [3 rows x 5 columns] >>> expected Out[2]: rt ts tt ttct 0 2048 30 0 90 1 1 4096 90 1 30 1 2 0 70 2 65 1 [3 rows x 5 columns] >>> output == expected Out[3]: rt ts tt ttct 0 True True True True True 1 True True True True True 2 True True True True True
However, I can't just rely on the == operator because of NaNs. I got the impression that a suitable way to resolve this is with the equals method. From the doc:
pandas.DataFrame.equals DataFrame.equals(other) Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.
Nevertheless:
>>> expected.equals(log_events) Out[4]: False
A little digging shows the difference in frames:
>>> output._data Out[5]: BlockManager Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object') Axis 1: Int64Index([0, 1, 2], dtype='int64') FloatBlock: [r], 1 x 3, dtype: float64 IntBlock: [t, ts, tt, ttct], 4 x 3, dtype: int64 >>> expected._data Out[6]: BlockManager Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object') Axis 1: Int64Index([0, 1, 2], dtype='int64') IntBlock: [r, t, ts, tt, ttct], 5 x 3, dtype: int64
Force the floating-point output block to be int or to make the expected int block float, and the test passes.
Obviously, there are different feelings of equality, and in some cases, the type of test that runs DataFrame.equals can be useful. However, the mismatch between == and DataFrame.equals upsets me and seems to be inconsistent. In pseudo code, I expect its behavior to match:
(self.index == other.index).all() \ and (self.columns == other.columns).all() \ and (self.values.fillna(SOME_MAGICAL_VALUE) == other.values.fillna(SOME_MAGICAL_VALUE)).all().all()
However, it is not. Am I mistaken in my thoughts or is this inconsistency in the Pandas API? Moreover, what test should I perform for my purposes, given the possible presence of NaN?