Pandas three-dot color scatter and marine

When using pandas and seaborn, there is a strange behavior for plotting a scatter plot that has only three points: the points do not have the same color. The problem disappears when Georgian is not loaded or when there are more than three points, or with a direct schedule using the matplot blib method. See the following example:

from pandas import DataFrame #0.16.0 import matplotlib.pyplot as plt #1.4.3 import seaborn as sns #0.5.1 import numpy as np #1.9.2 df = DataFrame({'x': np.random.uniform(0, 1, 3), 'y': np.random.uniform(0, 1, 3)}) df.plot(kind = 'scatter', x = 'x', y = 'y') plt.show() 

 df = DataFrame({'x': np.random.uniform(0, 1, 4), 'y': np.random.uniform(0, 1, 4)}) df.plot(kind = 'scatter', x = 'x', y = 'y') plt.show() 

+5
source share
2 answers

I found an error. The error in pandas technically, not seaborn , as I originally thought, although it includes code from pandas , seaborn and matplotlib ...

The following code appears in pandas.tools.plotting.ScatterPlot._make_plot to select the colors that will be used in the scatter graph

 if c is None: c_values = self.plt.rcParams['patch.facecolor'] elif c_is_column: c_values = self.data[c].values else: c_values = c 

In your case, c will be equal to None , which is the default value, and therefore plt.rcParams['patch.facecolor'] will be set to plt.rcParams['patch.facecolor'] .

Now, as part of the setup, the marine version changes plt.rcParams['patch.facecolor'] to (0.5725490196078431, 0.7764705882352941, 1.0) , which is an RGB tuple. If seaborn not used, the value is the default matplotlib value, which is 'b' (a string indicating the color is blue).

c_values then used later to actually plot inside ax.scatter

 scatter = ax.scatter(data[x].values, data[y].values, c=c_values, label=label, cmap=cmap, **self.kwds) 

The problem arises because the keyword argument c can take several different types of arguments, it can take: -

  • a string (for example, 'b' in the original case of matplotlib);
  • a sequence of color specifications (for example, a sequence of RGB values);
  • a sequence of values ​​to display the current color map.

Matplotlib specs indicate the following: mine allocation

c can be a single color format string or a sequence of color specifications of length N or a sequence of N numbers to be displayed in colors using cmap and the rate specified by kwargs (see below). Note that c should not be a single RGB numerical or RGBA sequence, because it is indistinguishable from an array of values ​​that must be matched. c can be a two-dimensional array in which the rows are RGB or RGBA, however.

What basically happens is that matplotlib takes the value of c_values (which is a tuple of three numbers), and then maps these colors to the current color palette (which is set by default pandas as Greys ). Thus, you get three scatter points with different “grayishness”. When you have more than 3 scatter points, matplotlib assumes that it should be an RGB tuple, because the length does not match the length of the data arrays (3! = 4), and therefore uses it as a constant RBG color.

This was recorded as a bug report in pithas github here .

+6
source

You might want to try the following:

 import seaborn.apionly as sns 

And see this question for more details.

-1
source

Source: https://habr.com/ru/post/1216072/


All Articles