Pandas: subtracting two date columns and the result is an integer

I have two columns in a Pandas data frame that are dates.

I want to subtract one column from another, and the result will be the difference in the number of days as an integer .

Take a look at the data:

df_test.head(10) Out[20]: First_Date Second Date 0 2016-02-09 2015-11-19 1 2016-01-06 2015-11-30 2 NaT 2015-12-04 3 2016-01-06 2015-12-08 4 NaT 2015-12-09 5 2016-01-07 2015-12-11 6 NaT 2015-12-12 7 NaT 2015-12-14 8 2016-01-06 2015-12-14 9 NaT 2015-12-15 

I successfully created a new column with a difference:

 df_test['Difference'] = df_test['First_Date'].sub(df_test['Second Date'], axis=0) df_test.head() Out[22]: First_Date Second Date Difference 0 2016-02-09 2015-11-19 82 days 1 2016-01-06 2015-11-30 37 days 2 NaT 2015-12-04 NaT 3 2016-01-06 2015-12-08 29 days 4 NaT 2015-12-09 NaT 

However, I cannot get a numerical version of the result:

 df_test['Difference'] = df_test[['Difference']].apply(pd.to_numeric) df_test.head() Out[25]: First_Date Second Date Difference 0 2016-02-09 2015-11-19 7.084800e+15 1 2016-01-06 2015-11-30 3.196800e+15 2 NaT 2015-12-04 NaN 3 2016-01-06 2015-12-08 2.505600e+15 4 NaT 2015-12-09 NaN 
+7
python numpy pandas datetime int
source share
3 answers

You can split the dtype timedelta column by np.timedelta64(1, 'D') , but the output is not int , but float , because NaN values ​​are :

 df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D') print (df_test) First_Date Second Date Difference 0 2016-02-09 2015-11-19 82.0 1 2016-01-06 2015-11-30 37.0 2 NaT 2015-12-04 NaN 3 2016-01-06 2015-12-08 29.0 4 NaT 2015-12-09 NaN 5 2016-01-07 2015-12-11 27.0 6 NaT 2015-12-12 NaN 7 NaT 2015-12-14 NaN 8 2016-01-06 2015-12-14 23.0 9 NaT 2015-12-15 NaN 

Frequency conversion .

+7
source share

You can use the datetime module to help here. In addition, as a side note, subtracting a simple date should work as follows:

 import datetime as dt import numpy as np import pandas as pd #Assume we have df_test: In [222]: df_test Out[222]: first_date second_date 0 2016-01-31 2015-11-19 1 2016-02-29 2015-11-20 2 2016-03-31 2015-11-21 3 2016-04-30 2015-11-22 4 2016-05-31 2015-11-23 5 2016-06-30 2015-11-24 6 NaT 2015-11-25 7 NaT 2015-11-26 8 2016-01-31 2015-11-27 9 NaT 2015-11-28 10 NaT 2015-11-29 11 NaT 2015-11-30 12 2016-04-30 2015-12-01 13 NaT 2015-12-02 14 NaT 2015-12-03 15 2016-04-30 2015-12-04 16 NaT 2015-12-05 17 NaT 2015-12-06 In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date'] In [224]: df_test Out[224]: first_date second_date Difference 0 2016-01-31 2015-11-19 73 days 1 2016-02-29 2015-11-20 101 days 2 2016-03-31 2015-11-21 131 days 3 2016-04-30 2015-11-22 160 days 4 2016-05-31 2015-11-23 190 days 5 2016-06-30 2015-11-24 219 days 6 NaT 2015-11-25 NaT 7 NaT 2015-11-26 NaT 8 2016-01-31 2015-11-27 65 days 9 NaT 2015-11-28 NaT 10 NaT 2015-11-29 NaT 11 NaT 2015-11-30 NaT 12 2016-04-30 2015-12-01 151 days 13 NaT 2015-12-02 NaT 14 NaT 2015-12-03 NaT 15 2016-04-30 2015-12-04 148 days 16 NaT 2015-12-05 NaT 17 NaT 2015-12-06 NaT 

Now change the type to datetime.timedelta, and then use the .days method for valid timedelta objects.

 In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days) In [227]: df_test Out[227]: first_date second_date Difference Diffference 0 2016-01-31 2015-11-19 73 days 73 1 2016-02-29 2015-11-20 101 days 101 2 2016-03-31 2015-11-21 131 days 131 3 2016-04-30 2015-11-22 160 days 160 4 2016-05-31 2015-11-23 190 days 190 5 2016-06-30 2015-11-24 219 days 219 6 NaT 2015-11-25 NaT NaN 7 NaT 2015-11-26 NaT NaN 8 2016-01-31 2015-11-27 65 days 65 9 NaT 2015-11-28 NaT NaN 10 NaT 2015-11-29 NaT NaN 11 NaT 2015-11-30 NaT NaN 12 2016-04-30 2015-12-01 151 days 151 13 NaT 2015-12-02 NaT NaN 14 NaT 2015-12-03 NaT NaN 15 2016-04-30 2015-12-04 148 days 148 16 NaT 2015-12-05 NaT NaN 17 NaT 2015-12-06 NaT NaN 

Hope this helps.

+5
source share

What about:

 df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days 

This will return the difference as an int.

0
source share

All Articles