I have two DataFrames (with DatetimeIndex ) and want to update the first frame (older) with data from the second frame (newer).
A new frame may contain more recent data for rows already contained in the old frame. In this case, the data in the old frame should be overwritten with data from the new frame. In addition, a new frame may contain more columns / rows than the first. In this case, the old frame should be enlarged with the data in the new frame.
Pandas docs state which
" .loc/.ix/[] operations can perform an increase when setting up a nonexistent key for this axis"
and
"a DataFrame can be enlarged on any axis with .loc "
However, this does not seem to work and throws a KeyError . Example:
In [195]: df1 Out[195]: ABC 2015-07-09 12:00:00 1 1 1 2015-07-09 13:00:00 1 1 1 2015-07-09 14:00:00 1 1 1 2015-07-09 15:00:00 1 1 1 In [196]: df2 Out[196]: ABCD 2015-07-09 14:00:00 2 2 2 2 2015-07-09 15:00:00 2 2 2 2 2015-07-09 16:00:00 2 2 2 2 2015-07-09 17:00:00 2 2 2 2 In [197]: df1.loc[df2.index] = df2 --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-197-74e630e87cf8> in <module>() ----> 1 df1.loc[df2.index] = df2 /.../pandas/core/indexing.pyc in __setitem__(self, key, value) 112 113 def __setitem__(self, key, value): --> 114 indexer = self._get_setitem_indexer(key) 115 self._setitem_with_indexer(indexer, value) 116 /.../pandas/core/indexing.pyc in _get_setitem_indexer(self, key) 107 108 try: --> 109 return self._convert_to_indexer(key, is_setter=True) 110 except TypeError: 111 raise IndexingError(key) /.../pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter) 1110 mask = check == -1 1111 if mask.any(): -> 1112 raise KeyError('%s not in index' % objarr[mask]) 1113 1114 return _values_from_object(indexer) KeyError: "['2015-07-09T18:00:00.000000000+0200' '2015-07-09T19:00:00.000000000+0200'] not in index"
What is the best way (in terms of performance, since my real data is much larger), the two achieve the desired updated and extended DataFrame. This is the result that I would like to see:
ABCD 2015-07-09 12:00:00 1 1 1 NaN 2015-07-09 13:00:00 1 1 1 NaN 2015-07-09 14:00:00 2 2 2 2 2015-07-09 15:00:00 2 2 2 2 2015-07-09 16:00:00 2 2 2 2 2015-07-09 17:00:00 2 2 2 2
python pandas
bmu
source share