Min / Max shading seasonality using Seaborn

Question

Min / Max shading seasonality using Seaborn

I am trying to create a three-line time series chart based on the following data , in the week x overload graph, where each cluster is a different row.

I have several observations for each pair (Cluster, Week) (5 for each atm, there will be 1000). I would like the points on the line to be the average overload for this particular (Cluster, Week) pair, and the range to be the minimum / maximum value.

Currently, the following bit of code is used to build it, but I do not get any lines, since I do not know which block should be specified using the current data frame:

ax14 = sns.tsplot(data = long_total_cluster_capacity_overload_df, value = "Overload", time = "Week", condition = "Cluster")

GIST data

I have a feeling that I still need to reformat my data file, but I have no idea how to do this. Looking for final results that look like

+5

python pandas seaborn

Silviu tofan Jun 11 '16 at 19:26

source share

3 answers

Finally, I used a good old plot with a design (subtitles) that seems to me more readable.

 df = pd.read_csv('TSplot.csv', sep='\t', index_col=0) # Compute the min, mean and max (could also be other values) grouped = df.groupby(["Cluster", "Week"]).agg({'Overload': ['min', 'mean', 'max']}).unstack("Cluster") # Plot with sublot since it is more readable axes = grouped.loc[:,('Overload', 'mean')].plot(subplots=True) # Getting the color palette used palette = sns.color_palette() # Initializing an index to get each cluster and each color index = 0 for ax in axes: ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'mean', index + 1)], grouped.loc[:,('Overload', 'max', index + 1 )], alpha=.2, color=palette[index]) ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'min', index + 1)] , grouped.loc[:,('Overload', 'mean', index + 1)], alpha=.2, color=palette[index]) index +=1

+4

Romain Jun 12 '16 at 16:58

source share

I really thought I could do this with seaborn.tsplot . But this is not entirely correct. Here is the result that I get from the seabed:

 cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount() ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", ci=100, unit="Unit", data=cluster_overload)

Outputs:

I am really confused about why the unit parameter is necessary, as I understand that all data is aggregated based on (time, condition) Seaborn Documentation defines unit as

A field in a DataFrame that identifies a sample block (e.g., subject, neuron, etc.). The representation of the error will be destroyed units at each observation of the time / condition. This has no role when the data is an array.

I am not sure about the meaning of “collapsing,” especially because my definition will not become a mandatory variable.

In any case, the conclusion here is, if you want exactly what you discussed, it is not so beautiful. I'm not sure how to manually shade in these regions, but please share if you find out.

 cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) grouped = cluster_overload.groupby(['Cluster','Week'],as_index=False) stats = grouped.agg(['min','mean','max']).unstack().T stats.index = stats.index.droplevel(0) colors = ['b','g','r'] ax = stats.loc['mean'].plot(color=colors, alpha=0.8, linewidth=3) stats.loc['max'].plot(ax=ax,color=colors,legend=False, alpha=0.3) stats.loc['min'].plot(ax=ax,color=colors,legend=False, alpha=0.3)

Outputs:

+1

michael_j_ward Jun 12 '16 at 15:01

source share

michael_j_ward · Accepted Answer · 2016-06-12T16:51:37+0000

Based on this incredible answer , I was able to create a monkey patch to beautifully do what you are looking for.

 import pandas as pd import seaborn as sns import seaborn.timeseries def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs): upper = data.max(axis=0) lower = data.min(axis=0) #import pdb; pdb.set_trace() ci = np.asarray((lower, upper)) kwargs.update({"central_data": central_data, "ci": ci, "data": data}) seaborn.timeseries._plot_ci_band(*args, **kwargs) seaborn.timeseries._plot_range_band = _plot_range_band cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount() ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", unit="Unit", data=cluster_overload, err_style="range_band", n_boot=0)

Output Schedule:

Note that the shaded areas line up with the true highs and lows in the line chart!

If you find out why the unit variable is required, let me know.

If you do not want all of them to be on the same chart, follow these steps:

 import pandas as pd import seaborn as sns import seaborn.timeseries def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs): upper = data.max(axis=0) lower = data.min(axis=0) #import pdb; pdb.set_trace() ci = np.asarray((lower, upper)) kwargs.update({"central_data": central_data, "ci": ci, "data": data}) seaborn.timeseries._plot_ci_band(*args, **kwargs) seaborn.timeseries._plot_range_band = _plot_range_band cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['subindex'] = cluster_overload.groupby(['Cluster','Week']).cumcount() def customPlot(*args,**kwargs): df = kwargs.pop('data') pivoted = df.pivot(index='subindex', columns='Week', values='Overload') ax = sns.tsplot(pivoted.values, err_style="range_band", n_boot=0, color=kwargs['color']) g = sns.FacetGrid(cluster_overload, row="Cluster", sharey=False, hue='Cluster', aspect=3) g = g.map_dataframe(customPlot, 'Week', 'Overload','subindex')

Which gives the following (you can play with aspect ratio if you think proportions are disabled)

Min / Max shading seasonality using Seaborn

More articles: