How to designate a bubble chart / scatter plot with a column from a pandas dataframe?

I am trying to label a scatter / bubble chart that I create from matplotlib, with entries from a column in a pandas data frame. I have seen many examples and questions (see, for example, here and here ). So I tried to comment on the plot accordingly. That's what I'm doing:

import matplotlib.pyplot as plt import pandas as pd #example data frame x = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30] y = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600] s = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30] users =['mark', 'mark', 'mark', 'rachel', 'rachel', 'rachel', 'jeff', 'jeff', 'jeff', 'lauren', 'lauren', 'lauren'] df = pd.DataFrame(dict(x=x, y=y, users=users) #my attempt to plot things plt.scatter(x_axis, y_axis, s=area, alpha=0.5) plt.xlabel(xlabel) plt.ylabel(ylabel) plt.annotate(df.users, xy=(x,y)) plt.show() 

I am using a datframe pandas and somehow I get a KeyError, so I assume the dict() object is expected? Is there any other way to mark data using records from a pandas data frame?

+7
matplotlib pandas scatter-plot
Jan 05 '17 at 9:18
source share
2 answers

You can use DataFrame.plot.scatter , and then select DataFrame.iat in a loop:

 ax = df.plot.scatter(x='x', y='y', alpha=0.5) for i, txt in enumerate(df.users): ax.annotate(txt, (df.x.iat[i],df.y.iat[i])) plt.show() 

graph

+6
Jan 05 '17 at 9:24 on
source share

Jezreal's answer is fine, but I will post this to show what I meant with df.iterrows in another thread.

I am afraid that you should put the scatter command (or chart) in a loop if you want to have dynamic size.

 df = pd.DataFrame(dict(x=x, y=y, s=s, users=users)) fig, ax = plt.subplots(facecolor='w') for key, row in df.iterrows(): ax.scatter(row['x'], row['y'], s=row['s']*5, alpha=.5) ax.annotate(row['users'], xy=(row['x'], row['y'])) 

enter image description here

+1
Jan 05 '17 at 11:35
source share



All Articles