Calculate and print samples using Pandas

Question

Calculate and print samples using Pandas

I have multidimensional data in a pandas data frame with a single variable indicating the class. For example, here is my attempt with a poor display of the heat map dispersion graph:

import pandas as pd import random import numpy as np import matplotlib.pyplot as plt from matplotlib.cm import get_cmap nrows=1000 df=pd.DataFrame([[random.random(), random.random()]+[random.randint(0, 1)] for _ in range(nrows)], columns=list("ABC")) bins=np.linspace(0, 1, 20) df["Abin"]=[bins[i-1] for i in np.digitize(df.A, bins)] df["Bbin"]=[bins[i-1] for i in np.digitize(df.B, bins)] g=df.ix[:,["Abin", "Bbin"]+["C"]].groupby(["Abin", "Bbin"]) data=g.agg(["sum", "count"]) data.reset_index(inplace=True) data["classratio"]=data[("C", "sum")]/data[("C","count")] plt.scatter(data.Abin, data.Bbin, c=data.classratio, cmap=get_cmap("RdYlGn_r"), marker="s")

I would like to draw the density of classes compared to bin functions. Now I used np.digitize for binning and some complicating Python density calculations manually to build a heatmap.

Of course, can this be done more compactly with pandas (pivot?)? Do you know a neat way to combine two functions (for example, 10 bins in the interval 0 ... 1) and then draw a class density density map, where the color indicates the ratio of 1 to the common lines in this 2D bunker?

+2

python matplotlib pandas

Gerenuk Jul 07 '14 at 14:35

source share

1 answer

CT Zhu · Accepted Answer · 2014-07-07T21:12:29+0000

Yes, this can be done very briefly using the cut build function:

In [65]:

 nrows=1000 df=pd.DataFrame([[random.random(), random.random()]+[random.randint(0, 1)] for _ in range(nrows)], columns=list("ABC")) In [66]: #This does the trick. pd.crosstab(np.array(pd.cut(df.A, 20)), np.array(pd.cut(df.B, 20))).values Out[66]: array([[2, 2, 2, 2, 7, 2, 3, 5, 1, 4, 2, 2, 1, 3, 2, 1, 7, 2, 4, 2], [1, 2, 4, 2, 0, 3, 3, 3, 1, 1, 2, 1, 4, 3, 2, 1, 1, 2, 2, 1], [0, 4, 1, 3, 1, 3, 2, 5, 2, 3, 1, 1, 1, 4, 2, 3, 6, 5, 2, 2], [5, 2, 3, 2, 2, 1, 3, 2, 4, 0, 3, 2, 0, 4, 3, 2, 1, 3, 1, 3], [2, 2, 4, 1, 3, 2, 2, 4, 1, 4, 3, 5, 5, 2, 3, 3, 0, 2, 4, 0], [2, 3, 3, 5, 2, 0, 5, 3, 2, 3, 1, 2, 5, 4, 4, 3, 4, 3, 6, 4], [3, 2, 2, 4, 3, 3, 2, 0, 0, 4, 3, 2, 2, 5, 4, 0, 1, 2, 2, 3], [0, 0, 4, 4, 3, 2, 4, 6, 4, 2, 0, 5, 2, 2, 1, 3, 4, 4, 3, 2], [3, 2, 2, 3, 4, 2, 1, 3, 1, 3, 4, 2, 4, 3, 2, 3, 2, 3, 4, 4], [0, 1, 1, 4, 1, 4, 3, 0, 1, 1, 1, 2, 6, 4, 3, 5, 3, 3, 1, 4], [2, 2, 4, 1, 3, 4, 1, 2, 1, 3, 3, 3, 1, 2, 1, 5, 2, 1, 4, 3], [0, 0, 0, 4, 2, 0, 2, 3, 2, 2, 2, 4, 4, 2, 3, 2, 1, 2, 1, 0], [3, 3, 0, 3, 1, 5, 1, 1, 2, 5, 6, 5, 0, 0, 3, 2, 1, 5, 7, 2], [3, 3, 2, 1, 2, 2, 2, 2, 4, 0, 1, 3, 3, 1, 5, 6, 1, 3, 2, 2], [3, 0, 3, 4, 3, 2, 1, 4, 2, 3, 4, 0, 5, 3, 2, 2, 4, 3, 0, 2], [0, 3, 2, 2, 1, 5, 1, 4, 3, 1, 2, 2, 3, 5, 1, 2, 2, 2, 1, 2], [1, 3, 2, 1, 1, 4, 4, 3, 2, 2, 5, 5, 1, 0, 1, 0, 4, 3, 3, 2], [2, 2, 2, 1, 1, 3, 1, 6, 5, 2, 5, 2, 3, 4, 2, 2, 1, 1, 4, 0], [3, 3, 4, 7, 0, 2, 6, 4, 1, 3, 4, 4, 1, 4, 1, 1, 2, 1, 3, 2], [3, 6, 3, 4, 1, 3, 1, 3, 3, 1, 6, 2, 2, 2, 1, 1, 4, 4, 0, 4]]) In [67]: abins=np.linspace(df.A.min(), df.A.max(), 21) bbins=np.linspace(df.B.min(), df.B.max(), 21) Z=pd.crosstab(np.array(pd.cut(df.ix[df.C==1, 'A'], abins)), np.array(pd.cut(df.ix[df.C==1, 'B'], bbins)), aggfunc=np.mean).div( pd.crosstab(np.array(pd.cut(df.A, abins)), np.array(pd.cut(df.B, bbins)), aggfunc=np.mean)).values Z = np.ma.masked_where(np.isinf(Z),Z) x=np.linspace(df.A.min(), df.A.max(), 20) y=np.linspace(df.B.min(), df.B.max(), 20) X,Y=np.meshgrid(x, y) plt.contourf(X, Y, Z, vmin=0, vmax=1) plt.colorbar()

enter image description here

 plt.pcolormesh(X, Y, Z, vmin=0, vmax=1) plt.colorbar()

enter image description here

Calculate and print samples using Pandas

More articles: