Tip Average value on pandas on a specific column

Question

Tip Average value on pandas on a specific column

I have a data frame that is imported from CSV.

stock pop Date 2016-01-04 325.316 82 2016-01-11 320.036 83 2016-01-18 299.169 79 2016-01-25 296.579 84 2016-02-01 295.334 82 2016-02-08 309.777 81 2016-02-15 317.397 75 2016-02-22 328.005 80 2016-02-29 315.504 81 2016-03-07 328.802 81 2016-03-14 339.559 86 2016-03-21 352.160 82 2016-03-28 348.773 84 2016-04-04 346.482 83 2016-04-11 346.980 80 2016-04-18 357.140 75 2016-04-25 357.439 77 2016-05-02 356.443 78 2016-05-09 365.158 78 2016-05-16 352.160 72 2016-05-23 344.540 74 2016-05-30 354.998 81 2016-06-06 347.428 77 2016-06-13 341.053 78 2016-06-20 363.515 80 2016-06-27 349.669 80 2016-07-04 371.583 82 2016-07-11 358.335 81 2016-07-18 362.021 79 2016-07-25 368.844 77 ... ... ...

I wanted to add a new MA column that computes the Rolling value for the pop column. I tried the following

 df['MA']=data.rolling(5,on='pop').mean()

I get an error

 ValueError: Wrong number of items passed 2, placement implies 1

So I thought let me try if it just works without adding a column. I used

  data.rolling(5,on='pop').mean()

I got a conclusion

  stock pop Date 2016-01-04 NaN 82 2016-01-11 NaN 83 2016-01-18 NaN 79 2016-01-25 NaN 84 2016-02-01 307.2868 82 2016-02-08 304.1790 81 2016-02-15 303.6512 75 2016-02-22 309.4184 80 2016-02-29 313.2034 81 2016-03-07 319.8970 81 2016-03-14 325.8534 86 2016-03-21 332.8060 82 2016-03-28 336.9596 84 2016-04-04 343.1552 83 2016-04-11 346.7908 80 2016-04-18 350.3070 75 2016-04-25 351.3628 77 2016-05-02 352.8968 78 2016-05-09 356.6320 78 2016-05-16 357.6680 72 2016-05-23 355.1480 74 2016-05-30 354.6598 81 2016-06-06 352.8568 77 2016-06-13 348.0358 78 2016-06-20 350.3068 80 2016-06-27 351.3326 80 2016-07-04 354.6496 82 2016-07-11 356.8310 81 2016-07-18 361.0246 79 2016-07-25 362.0904 77 ... ... ...

I don't seem to have applied the Rolling value to the column. What am I doing wrong?

+8

python pandas

Anti21 Apr 16 '17 at 13:15

source share

3 answers

This solution worked for me.

 data['MA'] = data.rolling(5).mean()['pop']

I think the problem may be that on = 'pop' just changes the column to execute the current window from the index.

From the doc line: "For a DataFrame, the column on which the rolling window is calculated, not the index"

+3

ac2001 Apr 16 '17 at 13:25

source share

Edit: pd.rolling_mean deprecated in pandas and will be removed in the future. Instead: using pd.rolling , you can do:

 df['MA'] = df['pop'].rolling(window=5,center=False).mean()

for df data frame:

  Date stock pop 0 2016-01-04 325.316 82 1 2016-01-11 320.036 83 2 2016-01-18 299.169 79 3 2016-01-25 296.579 84 4 2016-02-01 295.334 82 5 2016-02-08 309.777 81 6 2016-02-15 317.397 75 7 2016-02-22 328.005 80 8 2016-02-29 315.504 81 9 2016-03-07 328.802 81

To obtain:

  Date stock pop MA 0 2016-01-04 325.316 82 NaN 1 2016-01-11 320.036 83 NaN 2 2016-01-18 299.169 79 NaN 3 2016-01-25 296.579 84 NaN 4 2016-02-01 295.334 82 82.0 5 2016-02-08 309.777 81 81.8 6 2016-02-15 317.397 75 80.2 7 2016-02-22 328.005 80 80.4 8 2016-02-29 315.504 81 79.8 9 2016-03-07 328.802 81 79.6

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html

Old: although it is deprecated, you can use:

 df['MA']=pd.rolling_mean(df['pop'], window=5)

To obtain:

  Date stock pop MA 0 2016-01-04 325.316 82 NaN 1 2016-01-11 320.036 83 NaN 2 2016-01-18 299.169 79 NaN 3 2016-01-25 296.579 84 NaN 4 2016-02-01 295.334 82 82.0 5 2016-02-08 309.777 81 81.8 6 2016-02-15 317.397 75 80.2 7 2016-02-22 328.005 80 80.4 8 2016-02-29 315.504 81 79.8 9 2016-03-07 328.802 81 79.6

Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html

+2

Chuck Apr 16 '17 at 13:27

source share

Andrew L · Accepted Answer · 2017-04-16T14:02:46+0000

To assign a column, you can create a moving object based on Series :

 df['new_col'] = data['column'].rolling(5).mean()

The answer posted by ac2001 is not the most efficient way to do this. It calculates the average value for each column in the data frame, then it assigns the column "ma" using the column "pop". The first method of the following is much more efficient:

 %timeit df['ma'] = data['pop'].rolling(5).mean() %timeit df['ma_2'] = data.rolling(5).mean()['pop'] 1000 loops, best of 3: 497 µs per loop 100 loops, best of 3: 2.6 ms per loop

I would not recommend using the second method if you do not need to store the calculated skating methods in all other columns.

Tip Average value on pandas on a specific column

More articles: