Delta of the day for dates> 292 years apart

I am trying to get daytime deltas for a wide range of pandas dates. However, for time deltas> 292, I get negative values. For example,

import pandas as pd dates = pd.Series(pd.date_range('1700-01-01', periods=4500, freq='m')) days_delta = (dates-dates.min()).astype('timedelta64[D]') 

However, using DatetimeIndex, I can do this, and it works the way I want,

 import pandas as pd import numpy as np dates = pd.date_range('1700-01-01', periods=4500, freq='m') days_fun = np.vectorize(lambda x: x.days) days_delta = days_fun(dates.date - dates.date.min()) 

The question is, how do I get the correct days_delta for Series objects?

+7
python numpy pandas datetime
source share
2 answers

Read here , in particular, about timedelta restrictions:

Pandas introduces Timedeltas in nanosecond resolution using 64-bit integers. Thus, 64-bit integer limits define the limits of Timedelta.

By the way, this is the same limitation that was mentioned in the documentation posted on the timestamp in Pandas:

Since pandas represents timestamps in nanosecond resolution, the time that can be represented using a 64-bit integer is limited to approximately 584 years

This suggests that the same recommendations that documents make to circumvent time stamp restrictions may apply to timedeltas. A solution to time constraints is found in the documents ( here ):

If you have data that is outside the timeline, see Timestamp limits, you can use PeriodIndex and / or Series of Periods to perform calculations.

0
source share

Bypass

If you have continuous dates with small spaces that can be calculated, as in your example, you can sort the series and then use cumsum to get around this problem, for example:

 import pandas as pd dates = pd.TimeSeries(pd.date_range('1700-01-01', periods=4500, freq='m')) dates.sort() dateshift = dates.shift(1) (dates - dateshift).fillna(0).dt.days.cumsum().describe() count 4500.000000 mean 68466.072444 std 39543.094524 min 0.000000 25% 34233.250000 50% 68465.500000 75% 102699.500000 max 136935.000000 dtype: float64 

See that min and max are both positive.

Failaround

If your spaces are too large, this workaround does not work. Like here:

 dates = pd.Series(pd.datetools.to_datetime(['2016-06-06', '1700-01-01','2200-01-01'])) dates.sort() dateshift = dates.shift(1) (dates - dateshift).fillna(0).dt.days.cumsum() 1 0 0 -97931 2 -30883 

This is because we calculate the step between each date and then add them. And when they are sorted, we guarantee the smallest possible steps, however each step is too large to handle in this case.

Reset order

As you see in the Failaround example, the series is no longer indexed. Correct this by calling the .reset_index(inplace=True) method in the series.

0
source share

All Articles