Visualizing scatter plots with overlapping dots in matplotlib

I have to represent about 30,000 points on a scatter plot in matplotlib. These points belong to two different classes, so I want to portray them in different colors.

I have succeeded in this, but there is a problem. The points overlap in many regions, and the class that I draw for the latter will be rendered on top of the other, hiding it. In addition, using the scatter plot it is not possible to show how many points are located in each region. I also tried to make a 2d histogram with histogram2d and imshow, but it is hard to show points belonging to both classes in a clear way.

Can you suggest a way to explain both the distribution of classes and the concentration of points?

EDIT: to be more clear, this is a link to my data file in the format "x, y, class"

+8
python matplotlib plot visualization scatter-plot
source share
2 answers

One approach is to plot the data as a scatter plot with low alpha , so you can see individual points as well as a rough measure of density. (The disadvantage of this is that the approach has a limited overlap range that it can show, i.e. the maximum density is about 1 / alpha.)

Here is an example:

enter image description here

As you can imagine, due to the limited range of overlap that can be expressed, there is a trade-off between the visibility of individual points and the expression of the number of overlappings (and the size of the marker, graph, etc.).

import numpy as np import matplotlib.pyplot as plt N = 10000 mean = [0, 0] cov = [[2, 2], [0, 2]] x,y = np.random.multivariate_normal(mean, cov, N).T plt.scatter(x, y, s=70, alpha=0.03) plt.ylim((-5, 5)) plt.xlim((-5, 5)) plt.show() 

(I assume that you were referring to 30e3 points, not 30e6. For 30e6, I think some type of averaged density plot will be required.)

+12
source share

You can also color the points by pre-calculating the density estimate of the core of the scatter distribution and using the density values โ€‹โ€‹to indicate the color for each scatter point. To change the code in the previous example:

 import numpy as np import matplotlib.pyplot as plt from scipy.stats import gaussian_kde as kde from matplotlib.colors import Normalize from matplotlib import cm N = 10000 mean = [0,0] cov = [[2,2],[0,2]] samples = np.random.multivariate_normal(mean,cov,N).T densObj = kde( samples ) def makeColours( vals ): colours = np.zeros( (len(vals),3) ) norm = Normalize( vmin=vals.min(), vmax=vals.max() ) #Can put any colormap you like here. colours = [cm.ScalarMappable( norm=norm, cmap='jet').to_rgba( val ) for val in vals] return colours colours = makeColours( densObj.evaluate( samples ) ) plt.scatter( samples[0], samples[1], color=colours ) plt.show() 

Scatter plot with density information

I recognized this trick some time ago when I noticed documentation about the scatter function -

 c : color or sequence of color, optional, default : 'b' 

c can be a single line of color format or a sequence of color characteristics of length N or a sequence of numbers N to be displayed in colors using cmap and norm specified through kwargs (see below). Note that c does not have to be a single numerical RGB or RGBA sequence, because it is indistinguishable from an array of values โ€‹โ€‹that must be matched. c can be a two-dimensional array in which the rows are RGB or RGBA, however, including the case of one row, to indicate the same color for all points.

+7
source share

All Articles