I have multidimensional data in a pandas data frame with a single variable indicating the class. For example, here is my attempt with a poor display of the heat map dispersion graph:
import pandas as pd import random import numpy as np import matplotlib.pyplot as plt from matplotlib.cm import get_cmap nrows=1000 df=pd.DataFrame([[random.random(), random.random()]+[random.randint(0, 1)] for _ in range(nrows)], columns=list("ABC")) bins=np.linspace(0, 1, 20) df["Abin"]=[bins[i-1] for i in np.digitize(df.A, bins)] df["Bbin"]=[bins[i-1] for i in np.digitize(df.B, bins)] g=df.ix[:,["Abin", "Bbin"]+["C"]].groupby(["Abin", "Bbin"]) data=g.agg(["sum", "count"]) data.reset_index(inplace=True) data["classratio"]=data[("C", "sum")]/data[("C","count")] plt.scatter(data.Abin, data.Bbin, c=data.classratio, cmap=get_cmap("RdYlGn_r"), marker="s")
I would like to draw the density of classes compared to bin functions. Now I used np.digitize for binning and some complicating Python density calculations manually to build a heatmap.
Of course, can this be done more compactly with pandas (pivot?)? Do you know a neat way to combine two functions (for example, 10 bins in the interval 0 ... 1) and then draw a class density density map, where the color indicates the ratio of 1 to the common lines in this 2D bunker?
python matplotlib pandas
Gerenuk
source share