Numpy and pandas timedelta error

In Python, I have an array of dates generated (or read from a CSV file) using pandas, and I want to add one year to each date. I can get it to work with pandas but not using numpy. What am I doing wrong? Or is it a mistake in pandas or numpy?

Thanks!

import numpy as np import pandas as pd from pandas.tseries.offsets import DateOffset # Generate range of dates using pandas. dates = pd.date_range('1980-01-01', '2015-01-01') # Add one year using pandas. dates2 = dates + DateOffset(years=1) # Convert result to numpy. THIS WORKS! dates2_np = dates2.values # Convert original dates to numpy array. dates_np = dates.values # Add one year using numpy. THIS FAILS! dates3 = dates_np + np.timedelta64(1, 'Y') # TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [ns] and [Y] because they have incompatible nonlinear base time units 
+5
source share
2 answers

Adding np.timedelta64(1, 'Y') to the dtype datetime64[ns] array does not work because the year does not correspond to a fixed number of nanoseconds. Sometimes a year is 365 days, sometimes 366 days, sometimes even a second jump. (Note the extra leap seconds, for example, the one that occurred on 2015-06-30 23:59:60 does not appear as NumPy datetime64s.)

The easiest way to add a year to a NumPy datetime64[ns] array is to break it down into its constituent parts, such as years, months, and days, perform calculations on integer arrays, and then rebuild the datetime64 array:

 def year(dates): "Return an array of the years given an array of datetime64s" return dates.astype('M8[Y]').astype('i8') + 1970 def month(dates): "Return an array of the months given an array of datetime64s" return dates.astype('M8[M]').astype('i8') % 12 + 1 def day(dates): "Return an array of the days of the month given an array of datetime64s" return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1 def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None, seconds=None, milliseconds=None, microseconds=None, nanoseconds=None): years = np.asarray(years) - 1970 months = np.asarray(months) - 1 days = np.asarray(days) - 1 types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]', '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]') vals = (years, months, days, weeks, hours, minutes, seconds, milliseconds, microseconds, nanoseconds) return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals) if v is not None) # break the datetime64 array into constituent parts years, months, days = [f(dates_np) for f in (year, month, day)] # recompose the datetime64 array after adding 1 to the years dates3 = combine64(years+1, months, days) 

gives

 In [185]: dates3 Out[185]: array(['1981-01-01', '1981-01-02', '1981-01-03', ..., '2015-12-30', '2015-12-31', '2016-01-01'], dtype='datetime64[D]') 

Despite the fact that there seems to be so much code, this is faster than adding DateOffset from 1 year:

 In [206]: %timeit dates + DateOffset(years=1) 1 loops, best of 3: 285 ms per loop In [207]: %%timeit .....: years, months, days = [f(dates_np) for f in (year, month, day)] .....: combine64(years+1, months, days) .....: 100 loops, best of 3: 2.65 ms per loop 

Of course, pd.tseries.offsets offers a whole set of offsets that do not have a simple counterpart when working with NumPy datetime64.

+5
source

Here is what he says in the numa documentation:

There are two Timedelta blocks ("Y, years and" M "months) that are specially processed because how much time they represent changes depending on when they are used. While the timedelta day unit is equivalent to 24 hours, there is no way Convert a monthly unit to days, because different months have different numbers of days.

Days and weeks seem to work:

 dates4 = dates_np + np.timedelta64(1, 'D') dates5 = dates_np + np.timedelta64(1, 'W') 
+1
source

All Articles