Building Time Series Using Seaborn FacetGrid

I have a DataFrame ( data ) with a simple integer index and 5 columns. Columns Date , Country , AgeGroup , Gender , Stat . (Names changed to protect the innocent.) I would like to create FacetGrid where Country defines a row, AgeGroup defines a column and Gender defines a hue. For each of these details, I would like to create a time series chart. That is, I have to get an array of graphs, each of which has 2 time series on it (1 male, 1 female). I can very close:

 g = sns.FacetGrid(data, row='Country', col='AgeGroup', hue='Gender') g.map(plt.plot, 'Stat') 

However, this just gives me the sample number on the x axis, not the date. Is there a quick fix in this context.

In general, I understand that the approach with FacetGrid is to make the grid, and then map a plotting function. If I wanted to collapse my own charting function, what are its conventions? In particular, how can I write my own FacetGrid function (go to map for FacetGrid ) that accepts multiple columns of data from my data set?

+7
python matplotlib pandas seaborn
source share
1 answer

First, I will answer your more general question. The rules of the functions that you can pass to FacetGrid.map are:

  • They should take massive inputs as positional arguments, with the first argument corresponding to the x axis and the second argument corresponding to the y axis (although, rather, on the second condition in the near future
  • They should also take two keyword arguments: color and label . If you want to use the hue variable, than they should be passed to the basic charting function, although you can just catch **kwargs and do nothing with them, if this does not apply to the specific plot you are doing.
  • When invoked, they must draw a plot on the "currently active" matplotlib axes.

There are times when your function draws a graph that looks right without taking x , y , position inputs. I think that basically what happens here when you use plt.plot . Then it may be easier to just call, for example, g.set_axis_labels("Date", "Stat") after using the map , which will correctly rename your axes. You can also do g.set(xticklabels=dates) to get more meaningful ticks.

There is also a more general function, FacetGrid.map_dataframe . The rules are similar here, but the function you pass must accept an input data file in the data parameter, and instead of accepting positional inputs like an array, a string corresponding to the variables in this data frame is required. At each iteration through the face, the function will be called with an input framework masked only by the values ​​for this combination of row , col and hue levels.

So, in your specific case, you need to write a function that we can call plot_by_date , which should look something like this:

 def plot_by_date(x, y, color=None, label=None): ... 

(I would be more useful for the body, but actually I do not know how much to do with dates and matplotlib). The end result is that when you call this function, it should display the current active axes. Then do

 g.map(plot_by_date, "Date", "Stat") 

And it should work, I think.

+9
source share

All Articles