Removing infinite values from data in pandas?

Question

Removing infinite values from data in pandas?

what is the fastest / easiest way to reset nan and inf / -inf values from pandas DataFrame without resetting mode.use_inf_as_null ? I would like to be able to use the arguments subset and how dropna , with the exception of those that are considered missing inf , for example:

 df.dropna(subset=["col1", "col2"], how="all", with_inf=True)

Is it possible? Is there a way to tell dropna to include inf in the definition of missing values?

+178

python numpy scipy pandas

user248237 Jul 04 '13 at 20:55 on

source share

7 answers

In the context of the context, this is possible without constantly setting use_inf_as_na . For example:

 with pd.option_context('mode.use_inf_as_na', True): df = df.dropna(subset=['col1', 'col2'], how='all')

Of course, you can configure continuous processing of inf as NaN with

 pd.set_option('use_inf_as_na', True)

For older versions, replace use_inf_as_na with use_inf_as_null .

+19

ayhan Aug 17 '17 at 23:10

source share

Here is another method using .loc to replace inf with nan on the Series:

 s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan

So, in response to the original question:

 df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC')) for i in range(3): df.iat[i, i] = np.inf df ABC 0 inf 1.000000 1.000000 1 1.000000 inf 1.000000 2 1.000000 1.000000 inf df.sum() A inf B inf C inf dtype: float64 df.apply(lambda s: s[np.isfinite(s)].dropna()).sum() A 2 B 2 C 2 dtype: float64

+15

Alexander 03 Mar. '16 at 21:52

source share

The above solution will change inf that is not in the target columns. To fix this,

 lst = [np.inf, -np.inf] to_replace = {v: lst for v in ['col1', 'col2']} df.replace(to_replace, np.nan)

+7

has2k1 Aug 10 '14 at 2:27

source share

Another solution is to use the isin method. Use it to determine whether each value is infinite or absent, and then link the all method to determine whether all values in the strings are infinite or absent.

Finally, use the negation of this result to select rows that do not have all infinite or missing values using Boolean indexing.

 all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns') df[~all_inf_or_nan]

+6

Ted Petrou Nov 03 '17 at 18:34

source share

Use (quick and easy):

 df = df[np.isfinite(df).all(1)]

This answer is based on Dougr's answer to another question. Here is a sample code:

 import pandas as pd import numpy as np df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6]) print('Input:\n',df,sep='') df = df[np.isfinite(df).all(1)] print('\nDropped:\n',df,sep='')

Result:

 Input: 0 0 1.0000 1 2.0000 2 3.0000 3 NaN 4 4.0000 5 inf 6 5.0000 7 -inf 8 6.0000 Dropped: 0 0 1.0 1 2.0 2 3.0 4 4.0 6 5.0 8 6.0

+5

Markus Dutschke Mar 18 '19 at 18:41

source share

You can use pd.DataFrame.mask with np.isinf . First you need to make sure that all of your data series are of type float . Then use dropna with your existing logic.

 print(df) col1 col2 0 -0.441406 inf 1 -0.321105 -inf 2 -0.412857 2.223047 3 -0.356610 2.513048 df = df.mask(np.isinf(df)) print(df) col1 col2 0 -0.441406 NaN 1 -0.321105 NaN 2 -0.412857 2.223047 3 -0.356610 2.513048

+2

jpp Jun 28 '18 at 15:42

source share

Andy Hayden · Accepted Answer · 2013-07-04 21:50

The easiest way is to replace infs to NaN first:

 df.replace([np.inf, -np.inf], np.nan)

and then use dropna :

 df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")

For example:

 In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf]) In [12]: df.replace([np.inf, -np.inf], np.nan) Out[12]: 0 0 1 1 2 2 NaN 3 NaN

The same method will work for the series.

Removing infinite values ​​from data in pandas?

More articles:

Removing infinite values from data in pandas?