Scatter plots with string arrays in matplotlib

It looks like it should be easy, but I can't figure it out. I have a pandas data frame and would like to make a 3-column scatter plot of 3 columns. Columns X and Y are not numeric, they are strings, but I don't see how this should be a problem.

X= myDataFrame.columnX.values #string Y= myDataFrame.columnY.values #string Z= myDataFrame.columnY.values #float fig = pl.figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(X, Y, np.log10(Z), s=20, c='b') pl.show() 

is there no easy way to do this? Thanks.

+8
source share
3 answers

You can use np.unique (..., return_inverse = True) to get representative ints for each row. For instance,

 In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True) In [118]: X Out[118]: array([2, 1, 0, 2, 1, 0]) 

Note that X has an int32 int32 , since np.unique can handle no more than 2**31 unique lines.


 import pandas as pd import numpy as np import matplotlib.pyplot as plt import mpl_toolkits.mplot3d.axes3d as axes3d N = 12 arr = np.arange(N*2).reshape(N,2) words = np.array(['foo', 'bar', 'baz', 'quux', 'corge']) df = pd.DataFrame(words[arr % 5], columns=list('XY')) df['Z'] = np.linspace(1, 1000, N) Z = np.log10(df['Z']) Xuniques, X = np.unique(df['X'], return_inverse=True) Yuniques, Y = np.unique(df['Y'], return_inverse=True) fig = plt.figure() ax = fig.add_subplot(1, 1, 1, projection='3d') ax.scatter(X, Y, Z, s=20, c='b') ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques, yticks=range(len(Yuniques)), yticklabels=Yuniques) plt.show() 

enter image description here

+8
source

Try converting symbols to numbers to plot, and then use the symbols again for axis labels.

Hash usage

You can use the hash function to convert;

 from mpl_toolkits.mplot3d import Axes3D xlab = myDataFrame.columnX.values ylab = myDataFrame.columnY.values X =[hash(l) for l in xlab] Y =[hash(l) for l in xlab] Z= myDataFrame.columnY.values #float fig = figure() ax = fig.add_subplot(111, projection='3d') ax.scatter(X, Y, np.log10(Z), s=20, c='b') ax.set_xticks(X) ax.set_xticklabels(xlab) ax.set_yticks(Y) ax.set_yticklabels(ylab) show() 

As M4rtini pointed out in the comments, it is unclear what the distance / scaling of the string coordinates should be; the hash function may give unexpected intervals.

Non-degenerate uniform distance

If you want the points to be evenly distributed, you would have to use a different transformation. For example, you can use

 X =[i for i in range(len(xlab))] 

although this will cause each point to have a unique x position, even if the label is the same, and the points x and y will be correlated if you used the same approach for Y

Degenerate uniform interval

The third option is to first get unique xlab members (using, for example, set ), and then map each xlab to a position using a unique set for display; eg

 xmap = dict((sn, i)for i,sn in enumerate(set(xlab))) X = [xmap[l] for l in xlab] 
+2
source

Scatter does this automatically now (at least from matplotlib 2.1.0):

 plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1]) 

scatter plot

+1
source

All Articles