I have dataframes with various types of DateTime indexes (there may be weekly, monthly, annual data). I want to create columns that are lagging values โโof other columns. I get them imported from a spreadsheet, I do not generate a datetime index inside python.
I am struggling to find a "pythonic" way to do this. I believe that if you use the Pandas' datetime function, latency may be more reliable in the case of strange or exceptional data.
I made a toy example that seems to work, but it does not cope with my real example.
An example of a toy that works correctly (creates a new column with the value "foo" of the previous month)
rng = pd.date_range('2012-01-01', '2013-1-01', freq="M") toy2 = pd.DataFrame(pd.Series(np.random.randint(0, 50, len(rng)), index=rng, name="foo")) foo 2012-01-31 4 2012-02-29 2 2012-03-31 27 2012-04-30 7 2012-05-31 44 2012-06-30 22 2012-07-31 16 2012-08-31 18 2012-09-30 35 2012-10-31 35 2012-11-30 16 2012-12-31 32 toy2['lag_foo']= toy2['foo'].shift(1,'m') foo lag_foo 2012-01-31 4 NaN 2012-02-29 2 4.0 2012-03-31 27 2.0 2012-04-30 7 27.0 2012-05-31 44 7.0 2012-06-30 22 44.0 2012-07-31 16 22.0 2012-08-31 18 16.0 2012-09-30 35 18.0 2012-10-31 35 35.0 2012-11-30 16 35.0 2012-12-31 32 16.0
But when I run this on my example in real life, it fails:
ValueError: cannot re-index from the duplicate axis
print type(toy) print toy.columns print toy['IPE m2'][0:5] <class 'pandas.core.frame.DataFrame'> Index([u'IPE m2'], dtype='object') Date 2016-04-30 43.29 2016-03-31 40.44 2016-02-29 34.17 2016-01-31 32.47 2015-12-31 39.35 Name: IPE m2, dtype: float64
Exception Trace:
ValueError Traceback (most recent call last) <ipython-input-170-9cb57a2ed681> in <module>() ----> 1 toy['prev_1m']= toy['IPE m2'].shift(1,'m') C:\Users\mds\Anaconda2\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value) 2355 else: 2356 # set column -> 2357 self._set_item(key, value) 2358 2359 def _setitem_slice(self, key, value): C:\Users\mds\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value) 2421 2422 self._ensure_valid_index(value) -> 2423 value = self._sanitize_column(key, value) 2424 NDFrame._set_item(self, key, value) 2425 C:\Users\mds\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _sanitize_column(self, key, value) 2555 2556 if isinstance(value, Series): -> 2557 value = reindexer(value) 2558 2559 elif isinstance(value, DataFrame): C:\Users\mds\Anaconda2\lib\site-packages\pandas\core\frame.pyc in reindexer(value) 2547 # duplicate axis 2548 if not value.index.is_unique: -> 2549 raise e 2550 2551 # other ValueError: cannot reindex from a duplicate axis
It seems like I am missing some of the subtlety of datetime Pandas indexes that I think. Plus I'm not even sure if this is the perfect way to do this. the only thing I could suspect was that the idle toy.index has None as the frequency, while the working example of toy2 has its frequency set to "M"
toy.index DatetimeIndex(['2016-04-30', '2016-03-31', '2016-02-29', '2016-01-31', '2015-12-31', '2015-11-30', '2015-10-31', '2015-09-30', '2015-08-31', '2015-07-31', ... 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', name=u'Date', length=142, freq=None) toy2.index DatetimeIndex(['2012-01-31', '2012-02-29', '2012-03-31', '2012-04-30', '2012-05-31', '2012-06-30', '2012-07-31', '2012-08-31', '2012-09-30', '2012-10-31', '2012-11-30', '2012-12-31'], dtype='datetime64[ns]', freq='M') In [ ]:
=============================
I threw out NaT
toy = toy.dropna() toy['prev_1m']= toy['IPE m2'].shift(1,'m')
and I get the results that I wanted. However, I also get a warning:
C:\Users\mds\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http:
====
this assignment method suppresses warnings:
toy.loc[:,'prev_1m2']= toy['IPE m2'].shift(1,'m')