Building with GroupBy in Pandas / Python

Although direct and easy construction of group objects in pandas, I wonder what is the most pythonic (pandastic?) Way to capture unique groups from a groupby object. For example: I work with atmospheric data and try to build daily trends for several days or more. Below is a DataFrame containing data for several days, where the timestamp is an index:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10909 entries, 2013-08-04 12:01:00 to 2013-08-13 17:43:00
Data columns (total 17 columns):
Date     10909  non-null values
Flags    10909  non-null values
Time     10909  non-null values
convt    10909  non-null values
hino     10909  non-null values
hinox    10909  non-null values
intt     10909  non-null values
no       10909  non-null values
nox      10909  non-null values
ozonf    10909  non-null values
pmtt     10909  non-null values
pmtv     10909  non-null values
pres     10909  non-null values
rctt     10909  non-null values
smplf    10909  non-null values
stamp    10909  non-null values
no2      10909  non-null values
dtypes: datetime64[ns](1), float64(11), int64(2), object(3)

To be able to average (and accept other statistics) the data every minute for several days, I group the data:   data = no.groupby('Time')

Then I can easily build the average concentration of NO, as well as the quartiles:

ax = figure(figsize=(12,8)).add_subplot(111)
title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
ylabel('Concentration [ppb]')
data.no.mean().plot(ax=ax, style='b', label='Mean')
data.no.apply(lambda x: percentile(x, 25)).plot(ax=ax, style='r', label='25%')
data.no.apply(lambda x: percentile(x, 75)).plot(ax=ax, style='r', label='75%')

, , , , , fill_between(), x

fill_between(x, y1, y2=0, where=None, interpolate=False, hold=None, **kwargs)

, . :

  • groupby
  • DataFrame

, , . Python . /?

: unstack(),

no_new = no.groupby('Time')['no'].describe().unstack()
no_new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1440 entries, 00:00 to 23:59
Data columns (total 8 columns):
count    1440  non-null values
mean     1440  non-null values
std      1440  non-null values
min      1440  non-null values
25%      1440  non-null values
50%      1440  non-null values
75%      1440  non-null values
max      1440  non-null values
dtypes: float64(8)

fill_between() no_new.index, TypeError.

TypeError:

ax = figure(figzise=(12,8)).add_subplot(111)
ax.plot(no_new['mean'])
ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')

TypeError:

TypeError                                 Traceback (most recent call last)
<ipython-input-6-47493de920f1> in <module>()
      2 ax = figure(figsize=(12,8)).add_subplot(111)
      3 ax.plot(no_new['mean'])
----> 4 ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5,     facecolor='green')
      5 #title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
      6 #ylabel('Concentration [ppb]')

C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
   6986 
   6987         # Convert the arrays so we can work with them
-> 6988         x = ma.masked_invalid(self.convert_xunits(x))
   6989         y1 = ma.masked_invalid(self.convert_yunits(y1))
   6990         y2 = ma.masked_invalid(self.convert_yunits(y2))

C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\ma\core.pyc in masked_invalid(a, copy)
   2237         cls = type(a)
   2238     else:
-> 2239         condition = ~(np.isfinite(a))
   2240         cls = MaskedArray
   2241     result = a.view(cls)

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

: enter image description here

+4
1

groupby ( /25/75) , index, x plt.fill_between() ( matplotlib 1.3.1), .

gdf = df.groupby('Time')[col].describe().unstack()
plt.fill_between(gdf.index, gdf['25%'], gdf['75%'], alpha=.5)

gdf.info() :

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 00:00:00 to 22:00:00
Data columns (total 8 columns):
count    12 non-null float64
mean     12 non-null float64
std      12 non-null float64
min      12 non-null float64
25%      12 non-null float64
50%      12 non-null float64
75%      12 non-null float64
max      12 non-null float64
dtypes: float64(8)

: TypeError: ufunc 'isfinite' not supported Time "HH: MM" datetime.time, :

df['Time'] = df.Time.map(lambda x: pd.datetools.parse(x).time())
+5

All Articles