Pandas.DataFrame.equals contract

Question

Pandas.DataFrame.equals contract

I have a simple test case of a function that returns a df that could potentially contain NaN. I tested if the weekend and expected results were equal.

>>> output Out[1]: rt ts tt ttct 0 2048 30 0 90 1 1 4096 90 1 30 1 2 0 70 2 65 1 [3 rows x 5 columns] >>> expected Out[2]: rt ts tt ttct 0 2048 30 0 90 1 1 4096 90 1 30 1 2 0 70 2 65 1 [3 rows x 5 columns] >>> output == expected Out[3]: rt ts tt ttct 0 True True True True True 1 True True True True True 2 True True True True True

However, I can't just rely on the == operator because of NaNs. I got the impression that a suitable way to resolve this is with the equals method. From the doc:

 pandas.DataFrame.equals DataFrame.equals(other) Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

Nevertheless:

 >>> expected.equals(log_events) Out[4]: False

A little digging shows the difference in frames:

 >>> output._data Out[5]: BlockManager Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object') Axis 1: Int64Index([0, 1, 2], dtype='int64') FloatBlock: [r], 1 x 3, dtype: float64 IntBlock: [t, ts, tt, ttct], 4 x 3, dtype: int64 >>> expected._data Out[6]: BlockManager Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object') Axis 1: Int64Index([0, 1, 2], dtype='int64') IntBlock: [r, t, ts, tt, ttct], 5 x 3, dtype: int64

Force the floating-point output block to be int or to make the expected int block float, and the test passes.

Obviously, there are different feelings of equality, and in some cases, the type of test that runs DataFrame.equals can be useful. However, the mismatch between == and DataFrame.equals upsets me and seems to be inconsistent. In pseudo code, I expect its behavior to match:

 (self.index == other.index).all() \ and (self.columns == other.columns).all() \ and (self.values.fillna(SOME_MAGICAL_VALUE) == other.values.fillna(SOME_MAGICAL_VALUE)).all().all()

However, it is not. Am I mistaken in my thoughts or is this inconsistency in the Pandas API? Moreover, what test should I perform for my purposes, given the possible presence of NaN?

+7

python pandas

jwilner Oct 24 '14 at 16:23

source share

1 answer

Jeff · Answer 1 · 2014-10-24T17:13:47+0000

.equals() does just what it says. It checks for exact matching between elements, nans (and NaT) positioning, dtype equality, and index equality. Think of it as a type of df is df2 , but they do not have to be the same object, IOW, df.equals(df.copy()) always always.

Your example is not suitable, because different data types are not equal (they may be equivalent though). So you can use com.array_equivalent for this or (df == df2).all().all() if you don't have nans .

This is a replacement for np.array_equal , which is broken for nan positional detection (and dtypes objects).

Mainly used internally. However, if you like the enhancement for equivalence (e.g. elements are equivalent in the value of == and nan ), pls opens the problem on github. (and even better imagine PR!)

Pandas.DataFrame.equals contract

More articles: