How to get monthly average in pandas using groupby

Question

How to get monthly average in pandas using groupby

I have the following DataFrame:

data=pd.read_csv('anual.csv', parse_dates='Fecha', index_col=0)
data

DatetimeIndex: 290 entries, 2011-01-01 00:00:00 to 2011-12-31 00:00:00
Data columns (total 12 columns):
HR             290  non-null values
PreciAcu       290  non-null values
RadSolar       290  non-null values
T              290  non-null values
Presion        290  non-null values
Tmax           290  non-null values
HRmax          290  non-null values
Presionmax     290  non-null values
RadSolarmax    290  non-null values
Tmin           290  non-null values
HRmin          290  non-null values
Presionmin     290  non-null values
dtypes: float64(4), int64(8)

Where:

data['HR']

Fecha
2011-01-01    37
2011-02-01    70
2011-03-01    62
2011-04-01    69
2011-05-01    72
2011-06-01    71
2011-07-01    71
2011-08-01    70
2011-09-01    40
...
2011-12-17    92
2011-12-18    78
2011-12-19    79
2011-12-20    76
2011-12-21    78
2011-12-22    80
2011-12-23    72
2011-12-24    70

In addition, some months are not always full. My goal is to calculate the average of each month from daily data. This is achieved as follows:

monthly=data.resample('M', how='mean')

                HR   PreciAcu    RadSolar         T        Presion     Tmax  
Fecha                                                                         
2011-01-31  68.586207   3.744828  163.379310  17.496552        0  25.875862   
2011-02-28  68.666667   1.966667  208.000000  18.854167        0  28.879167   
2011-03-31  69.136364   3.495455  218.090909  20.986364        0  30.359091   
2011-04-30  68.956522   1.913043  221.130435  22.165217        0  31.708696   
2011-05-31  72.700000   0.550000  201.100000  18.900000        0  27.460000   
2011-06-30  70.821429   6.050000  214.000000  23.032143        0  30.621429   
2011-07-31  78.034483   5.810345  188.206897  21.503448        0  27.951724   
2011-08-31  71.750000   1.028571  214.750000  22.439286        0  30.657143   
2011-09-30  72.481481   0.185185  196.962963  21.714815        0  29.596296    
2011-10-31  68.083333   1.770833  224.958333  18.683333        0  27.075000   
2011-11-30  71.750000   0.812500  169.625000  18.925000        0  26.237500   
2011-12-31  71.833333   0.160000  159.533333  17.260000        0  25.403333

The first error that I found is in the precipitation column, since all observations are 0 in January, and an average of 3.74 is obtained during this particular month.

When averages are in Excel and compared with the results above, there are significant differences. For example, the average HR for Febrero is

   mean HR using pandas=68.66

   mean HR using excel=67

Another detail I found:

   data['PreciAcu']['2011-01'].count()

   29 and should be 31

Am I doing something wrong? How can I fix this error?

Application csv file:

[link] https://www.dropbox.com/s/p5hl137bqm82j41/anual.csv

+2

python pandas

user1345283 09 . '14 20:11

1

DSM · Answer 1 · 2014-01-09T21:29:34+0000

, DD/MM/YYYY. dayfirst=True:

>>> df = pd.read_csv('anual.csv', parse_dates='Fecha', dayfirst=True, index_col=0, sep="\s+")
>>> df['PreciAcu']['2011-01'].count()
31
>>> df.resample("M", how='mean')
                   HR   PreciAcu    RadSolar          T  Presion       Tmax  \
Fecha                                                                         
2011-01-31  68.774194   0.000000  162.354839  16.535484        0  25.393548   
2011-02-28  67.000000   0.000000  193.481481  15.418519        0  25.696296   
2011-03-31  59.083333   0.850000  254.541667  21.295833        0  32.325000   
2011-04-30  61.200000   1.312000  260.640000  24.676000        0  34.760000   
2011-05-31        NaN        NaN         NaN        NaN      NaN        NaN   
2011-06-30  68.428571   8.576190  236.619048  25.009524        0  32.028571   
2011-07-31  81.518519  11.488889  185.407407  22.429630        0  27.681481   
2011-08-31  76.451613   0.677419  219.645161  23.677419        0  30.719355   
2011-09-30  77.533333   2.883333  196.100000  21.573333        0  28.723333   
2011-10-31  73.120000   1.260000  196.280000  19.552000        0  27.636000   
2011-11-30  71.277778 -79.333333  148.555556  18.250000        0  26.511111   
2011-12-31  73.741935   0.067742  134.677419  15.687097        0  24.019355   

                HRmax  Presionmax       Tmin  
Fecha                                         
2011-01-31  92.709677           0  10.909677  
2011-02-28  92.111111           0   8.325926  
2011-03-31  89.291667           0  13.037500  
2011-04-30  89.400000           0  17.328000  
2011-05-31        NaN         NaN        NaN  
2011-06-30  92.095238           0  19.761905  
2011-07-31  97.185185           0  18.774074  
2011-08-31  96.903226           0  18.670968  
2011-09-30  97.200000           0  16.373333  
2011-10-31  97.000000           0  13.412000  
2011-11-30  94.555556           0  11.877778  
2011-12-31  94.161290           0  10.070968  

[12 rows x 9 columns]

( , - - dayfirst=True , . . , date_parser .)

How to get monthly average in pandas using groupby

More articles: