Is it possible to use pandas.dataframe.isin () with a numeric tolerance parameter?

I previewed the following posts. Is there a way to use DataFrame.isin () with an approximation coefficient or tolerance value? Or maybe another way?

Filter rows of data data if the value in the column is in the list of values

use list of values โ€‹โ€‹to select rows from pandas frame

EX)

df = DataFrame({'A' : [5,6,3.3,4], 'B' : [1,2,3.2, 5]}) In : df Out: AB 0 5 1 1 6 2 2 3.3 3.2 3 4 5 df[df['A'].isin([3, 6], tol=.5)] In : df Out: AB 1 6 2 2 3.3 3.2 
+7
python comparison pandas floating-accuracy comparison-operators
source share
1 answer

You can do a similar thing with numpy isclose :

 df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)] Out: AB 1 6.0 2.0 2 3.3 3.2 

np.isclose returns this:

 np.isclose(df['A'].values[:, None], [3, 6], atol=.5) Out: array([[False, False], [False, True], [ True, False], [False, False]], dtype=bool) 

This is a pairwise comparison of the elements df['A'] and [3, 6] (so we needed df['A'].values[: None] for translation). Since you are looking to see if it is close to one of them in the list, we call .any(axis=1) at the end.


For multiple columns, change the slice a bit:

 mask = np.isclose(df[['A', 'B']].values[:, :, None], [3, 6], atol=0.5).any(axis=(1, 2)) mask Out: array([False, True, True, False], dtype=bool) 

You can use this mask to trim a DataFrame (ie df[mask] )


If you want to compare df['A'] and df['B'] (and possible other columns) with different vectors, you can create two different masks:

 mask1 = np.isclose(df['A'].values[:, None], [1, 2, 3], atol=.5).any(axis=1) mask2 = np.isclose(df['B'].values[:, None], [4, 5], atol=.5).any(axis=1) mask3 = ... 

Then slice:

 df[mask1 & mask2] # or df[mask1 & mask2 & mask3 & ...] 
+12
source share

All Articles