Setting values with pandas.DataFrame

Question

Setting values with pandas.DataFrame

The presence of this DataFrame:

import pandas dates = pandas.date_range('2016-01-01', periods=5, freq='H') s = pandas.Series([0, 1, 2, 3, 4], index=dates) df = pandas.DataFrame([(1, 2, s, 8)], columns=['a', 'b', 'foo', 'bar']) df.set_index(['a', 'b'], inplace=True) df

I would like to replace the Series there with a new one that would just be old, but was re-mapped until the daytime (i.e. x.resample('D').sum().dropna() ).

When I try:

 df['foo'][0] = df['foo'][0].resample('D').sum().dropna()

This seems to work well:

However, I get a warning:

 SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

The question is, how do I do this?

Notes

Things I tried but don't work (oversampling or not, assignment throws an exception):

 df.iloc[0].loc['foo'] = df.iloc[0].loc['foo'] df.loc[(1, 2), 'foo'] = df.loc[(1, 2), 'foo'] df.loc[df.index[0], 'foo'] = df.loc[df.index[0], 'foo']

A bit more information about the data (in case it matters):

A real DataFrame has more columns in a multi-index. Not all of them are necessarily integers, but generally numerical and categorical. The index is unique (i.e. there is only one row with a given index value).
The real DataFrame has, of course, many more rows in it (thousands).
A DataFrame does not have to have only two columns, and there can be more than 1 column containing the type Series. Columns usually contain rows, categorical data, and numeric data. Any single column is always the same type (numeric, categorical or serial).
The series contained in each cell usually has a variable length (i.e. the two series / cells in the DataFrame do not, unless a pure coincidence, have the same length and will probably never have the same index anyway, as the dates change how good between series).

Using Python 3.5.1 and Pandas 0.18.1.

+8

python pandas

Peque Jun 01 '16 at 13:17

source share

3 answers

Hierarchical data in pandas

It seems you should consider rebuilding your data to take advantage of pandas features like MultiIndexing and DateTimeIndex . This will allow you to work with the index in a typical way , being able to select several columns according to hierarchical data ( a , b and bar ).

Restructured data

 import pandas as pd # Define Index dates = pd.date_range('2016-01-01', periods=5, freq='H') # Define Series s = pd.Series([0, 1, 2, 3, 4], index=dates) # Place Series in Hierarchical DataFrame heirIndex = pd.MultiIndex.from_arrays([1,2,8], names=['a','b', 'bar']) df = pd.DataFrame(s, columns=heirIndex) print df

 a 1 b 2 bar 8 2016-01-01 00:00:00 0 2016-01-01 01:00:00 1 2016-01-01 02:00:00 2 2016-01-01 03:00:00 3 2016-01-01 04:00:00 4

Resampling

With data in this format, re-sampling becomes very simple.

 # Simple Direct Resampling df_resampled = df.resample('D').sum().dropna() print df_resampled

 a 1 b 2 bar 8 2016-01-01 10

Update (from data description)

If the data is of variable length Series , each of which has different index and non-numeric categories, this is normal. Here is an example:

 # Define Series dates = pandas.date_range('2016-01-01', periods=5, freq='H') s = pandas.Series([0, 1, 2, 3, 4], index=dates) # Define Series dates2 = pandas.date_range('2016-01-14', periods=6, freq='H') s2 = pandas.Series([-200, 10, 24, 30, 40,100], index=dates2) # Define DataFrames df1 = pd.DataFrame(s, columns=pd.MultiIndex.from_arrays([1,2,8,'cat1'], names=['a','b', 'bar','c'])) df2 = pd.DataFrame(s2, columns=pd.MultiIndex.from_arrays([2,5,5,'cat3'], names=['a','b', 'bar','c'])) df = pd.concat([df1, df2]) print df

 a 1 2 b 2 5 bar 8 5 c cat1 cat3 2016-01-01 00:00:00 0.0 NaN 2016-01-01 01:00:00 1.0 NaN 2016-01-01 02:00:00 2.0 NaN 2016-01-01 03:00:00 3.0 NaN 2016-01-01 04:00:00 4.0 NaN 2016-01-14 00:00:00 NaN -200.0 2016-01-14 01:00:00 NaN 10.0 2016-01-14 02:00:00 NaN 24.0 2016-01-14 03:00:00 NaN 30.0 2016-01-14 04:00:00 NaN 40.0 2016-01-14 05:00:00 NaN 100.0

The only problems are after oversampling. You will want to use how='all' when deleting na lines like this:

 # Simple Direct Resampling df_resampled = df.resample('D').sum().dropna(how='all') print df_resampled

 a 1 2 b 2 5 bar 8 5 c cat1 cat3 2016-01-01 10.0 NaN 2016-01-14 NaN 4.0

0

tmthydvnprt Jun 10 '16 at 4:18

source share

Just set df.is_copy = False before assigning a new value.

0

Dark matter Jun 10 '16 at 11:25

source share

ayhan · Accepted Answer · 2016-06-01T14:44:10+0000

This should work:

 df.iat[0, df.columns.get_loc('foo')] = df['foo'][0].resample('D').sum().dropna()

Pandas complains about the indexing chain, but when you don't, it runs into problems by assigning a whole series to a cell. With iat you can force something like this. I don't think that would be preferable, but it seems like a working solution.

Setting values ​​with pandas.DataFrame

Notes

Hierarchical data in pandas

Restructured data

Resampling

Update (from data description)

More articles:

Setting values with pandas.DataFrame