Although direct and easy construction of group objects in pandas, I wonder what is the most pythonic (pandastic?) Way to capture unique groups from a groupby object. For example: I work with atmospheric data and try to build daily trends for several days or more. Below is a DataFrame containing data for several days, where the timestamp is an index:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10909 entries, 2013-08-04 12:01:00 to 2013-08-13 17:43:00
Data columns (total 17 columns):
Date 10909 non-null values
Flags 10909 non-null values
Time 10909 non-null values
convt 10909 non-null values
hino 10909 non-null values
hinox 10909 non-null values
intt 10909 non-null values
no 10909 non-null values
nox 10909 non-null values
ozonf 10909 non-null values
pmtt 10909 non-null values
pmtv 10909 non-null values
pres 10909 non-null values
rctt 10909 non-null values
smplf 10909 non-null values
stamp 10909 non-null values
no2 10909 non-null values
dtypes: datetime64[ns](1), float64(11), int64(2), object(3)
To be able to average (and accept other statistics) the data every minute for several days, I group the data: data = no.groupby('Time')
Then I can easily build the average concentration of NO, as well as the quartiles:
ax = figure(figsize=(12,8)).add_subplot(111)
title('Diurnal Profile for NO, NO2, and NOx: East St. Louis Air Quality Study')
ylabel('Concentration [ppb]')
data.no.mean().plot(ax=ax, style='b', label='Mean')
data.no.apply(lambda x: percentile(x, 25)).plot(ax=ax, style='r', label='25%')
data.no.apply(lambda x: percentile(x, 75)).plot(ax=ax, style='r', label='75%')
, , , , , fill_between(), x
fill_between(x, y1, y2=0, where=None, interpolate=False, hold=None, **kwargs)
, . :
, , . Python . /?
:
unstack(),
no_new = no.groupby('Time')['no'].describe().unstack()
no_new.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1440 entries, 00:00 to 23:59
Data columns (total 8 columns):
count 1440 non-null values
mean 1440 non-null values
std 1440 non-null values
min 1440 non-null values
25% 1440 non-null values
50% 1440 non-null values
75% 1440 non-null values
max 1440 non-null values
dtypes: float64(8)
fill_between() no_new.index, TypeError.
TypeError:
ax = figure(figzise=(12,8)).add_subplot(111)
ax.plot(no_new['mean'])
ax.fill_between(no_new.index, no_new['mean'], no_new['75%'], alpha=.5, facecolor='green')
TypeError:
TypeError Traceback (most recent call last)
<ipython-input-6-47493de920f1> in <module>()
2 ax = figure(figsize=(12,8)).add_subplot(111)
3 ax.plot(no_new['mean'])
5
6
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\matplotlib\axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
6986
6987
-> 6988 x = ma.masked_invalid(self.convert_xunits(x))
6989 y1 = ma.masked_invalid(self.convert_yunits(y1))
6990 y2 = ma.masked_invalid(self.convert_yunits(y2))
C:\Users\David\AppData\Local\Enthought\Canopy\User\lib\site-packages\numpy\ma\core.pyc in masked_invalid(a, copy)
2237 cls = type(a)
2238 else:
-> 2239 condition = ~(np.isfinite(a))
2240 cls = MaskedArray
2241 result = a.view(cls)
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
: 