Removing infinite values โ€‹โ€‹from data in pandas?

what is the fastest / easiest way to reset nan and inf / -inf values โ€‹โ€‹from pandas DataFrame without resetting mode.use_inf_as_null ? I would like to be able to use the arguments subset and how dropna , with the exception of those that are considered missing inf , for example:

 df.dropna(subset=["col1", "col2"], how="all", with_inf=True) 

Is it possible? Is there a way to tell dropna to include inf in the definition of missing values?

+178
python numpy scipy pandas
Jul 04 '13 at 20:55 on
source share
7 answers

The easiest way is to replace infs to NaN first:

 df.replace([np.inf, -np.inf], np.nan) 

and then use dropna :

 df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all") 

For example:

 In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf]) In [12]: df.replace([np.inf, -np.inf], np.nan) Out[12]: 0 0 1 1 2 2 NaN 3 NaN 

The same method will work for the series.

+323
Jul 04 '13 at 21:50
source share

In the context of the context, this is possible without constantly setting use_inf_as_na . For example:

 with pd.option_context('mode.use_inf_as_na', True): df = df.dropna(subset=['col1', 'col2'], how='all') 

Of course, you can configure continuous processing of inf as NaN with

 pd.set_option('use_inf_as_na', True) 



For older versions, replace use_inf_as_na with use_inf_as_null .

+19
Aug 17 '17 at 23:10
source share

Here is another method using .loc to replace inf with nan on the Series:

 s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan 

So, in response to the original question:

 df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC')) for i in range(3): df.iat[i, i] = np.inf df ABC 0 inf 1.000000 1.000000 1 1.000000 inf 1.000000 2 1.000000 1.000000 inf df.sum() A inf B inf C inf dtype: float64 df.apply(lambda s: s[np.isfinite(s)].dropna()).sum() A 2 B 2 C 2 dtype: float64 
+15
03 Mar. '16 at 21:52
source share

The above solution will change inf that is not in the target columns. To fix this,

 lst = [np.inf, -np.inf] to_replace = {v: lst for v in ['col1', 'col2']} df.replace(to_replace, np.nan) 
+7
Aug 10 '14 at 2:27
source share

Another solution is to use the isin method. Use it to determine whether each value is infinite or absent, and then link the all method to determine whether all values โ€‹โ€‹in the strings are infinite or absent.

Finally, use the negation of this result to select rows that do not have all infinite or missing values โ€‹โ€‹using Boolean indexing.

 all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns') df[~all_inf_or_nan] 
+6
Nov 03 '17 at 18:34
source share

Use (quick and easy):

 df = df[np.isfinite(df).all(1)] 

This answer is based on Dougr's answer to another question. Here is a sample code:

 import pandas as pd import numpy as np df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6]) print('Input:\n',df,sep='') df = df[np.isfinite(df).all(1)] print('\nDropped:\n',df,sep='') 

Result:

 Input: 0 0 1.0000 1 2.0000 2 3.0000 3 NaN 4 4.0000 5 inf 6 5.0000 7 -inf 8 6.0000 Dropped: 0 0 1.0 1 2.0 2 3.0 4 4.0 6 5.0 8 6.0 
+5
Mar 18 '19 at 18:41
source share

You can use pd.DataFrame.mask with np.isinf . First you need to make sure that all of your data series are of type float . Then use dropna with your existing logic.

 print(df) col1 col2 0 -0.441406 inf 1 -0.321105 -inf 2 -0.412857 2.223047 3 -0.356610 2.513048 df = df.mask(np.isinf(df)) print(df) col1 col2 0 -0.441406 NaN 1 -0.321105 NaN 2 -0.412857 2.223047 3 -0.356610 2.513048 
+2
Jun 28 '18 at 15:42
source share



All Articles