Rowwise min () and max () fail for a column with NaN

Question

Rowwise min () and max () fail for a column with NaN

I am trying to take in turn the maximum (and min) of two columns containing dates

from datetime import date
import pandas as pd
import numpy as np    

df = pd.DataFrame({'date_a' : [date(2015, 1, 1), date(2012, 6, 1),
                               date(2013, 1, 1), date(2016, 6, 1)],
                   'date_b' : [date(2012, 7, 1), date(2013, 1, 1), 
                               date(2014, 3, 1), date(2013, 4, 1)]})

df[['date_a', 'date_b']].max(axis=1)
Out[46]: 
0    2015-01-01
1    2013-01-01
2    2014-03-01
3    2016-06-01

as expected. However, if the data frame contains one NaN value, the whole operation fails.

df_nan = pd.DataFrame({'date_a' : [date(2015, 1, 1), date(2012, 6, 1),
                                   np.NaN, date(2016, 6, 1)],
                       'date_b' : [date(2012, 7, 1), date(2013, 1, 1), 
                                   date(2014, 3, 1), date(2013, 4, 1)]})

df_nan[['date_a', 'date_b']].max(axis=1)
Out[49]: 
0   NaN 
1   NaN
2   NaN
3   NaN
dtype: float64

What's going on here? I expected this result

0    2015-01-01
1    2013-01-01
2    NaN
3    2016-06-01

How to achieve this?

+6

python date pandas datetime nan dataframe

mortysporty Aug 26 '17 at 20:18

source share

3 answers

, date (, NaN) . numeric_only - float. , df_nan :

df_float = pd.DataFrame({'date_a' : [date(2015, 1, 1), date(2012, 6, 1),
                                    1.023, date(2016, 6, 1)],
                        'date_b' : [date(2012, 7, 1), 3.14, 
                                    date(2014, 3, 1), date(2013, 4, 1)]})

print(df_float.max(1))

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

false, TypeError, :

print(date(2015, 1, 1) < 1.0)

TypeError                                 Traceback (most recent call last)
<ipython-input-362-ccbf44ddb40a> in <module>()
      1 
----> 2 print(date(2015, 1, 1) < 1.0)

TypeError: unorderable types: datetime.date() < float()

, pandas, , NaN. str df.astype :

out = df_nan.astype(str).max(1)
print(out) 
0    2015-01-01
1    2013-01-01
2           nan
3    2016-06-01
dtype: object

, .

, juan , datetime pd.to_datetime:

out = df_nan.apply(pd.to_datetime, errors='coerce').max(1)
print(out)

0   2015-01-01
1   2013-01-01
2   2014-03-01
3   2016-06-01
dtype: datetime64[ns]

+6

cs95 26 . '17 20:25

:

>>> df_nan.where(df_nan.T.notnull().all()).max(axis=1)
Out[1]:
0    2015-01-01
1    2013-01-01
2          None
3    2016-06-01
dtype: object

:

df_nan.T.notnull().all() , np.nan
df_nan.where()
.max(axis=1)

This works because the maximum of the array, where all the values are np.nan, is None. This allows you to track rows where there is no value without displaying the maximum result.

But this solution is up to you, otherwise the @ juanpa.arrivillaga solution converting NaNto NaTyou need.

+1

Fabienp Aug 26 '17 at 21:19

source share

juanpa.arrivillaga · Accepted Answer · 2017-08-26T20:35:08+0000

I would say that the best solution is to use the appropriate one dtype. Pandas provides a very well integrated datetime dtype. So you are using objectdtypes ...

>>> df
       date_a      date_b
0  2015-01-01  2012-07-01
1  2012-06-01  2013-01-01
2         NaN  2014-03-01
3  2016-06-01  2013-04-01
>>> df.dtypes
date_a    object
date_b    object
dtype: object

But mind you, the problem disappears when you use

>>> df2 = df.apply(pd.to_datetime)
>>> df2
      date_a     date_b
0 2015-01-01 2012-07-01
1 2012-06-01 2013-01-01
2        NaT 2014-03-01
3 2016-06-01 2013-04-01
>>> df2.min(axis=1)
0   2012-07-01
1   2012-06-01
2   2014-03-01
3   2013-04-01
dtype: datetime64[ns]

Rowwise min () and max () fail for a column with NaN

More articles: