Min / Max shading seasonality using Seaborn

I am trying to create a three-line time series chart based on the following data Long dataframe , in the week x overload graph, where each cluster is a different row.

I have several observations for each pair (Cluster, Week) (5 for each atm, there will be 1000). I would like the points on the line to be the average overload for this particular (Cluster, Week) pair, and the range to be the minimum / maximum value.

Currently, the following bit of code is used to build it, but I do not get any lines, since I do not know which block should be specified using the current data frame:

ax14 = sns.tsplot(data = long_total_cluster_capacity_overload_df, value = "Overload", time = "Week", condition = "Cluster") 

GIST data

I have a feeling that I still need to reformat my data file, but I have no idea how to do this. Looking for final results that look like enter image description here

+5
source share
3 answers

Based on this incredible answer , I was able to create a monkey patch to beautifully do what you are looking for.

 import pandas as pd import seaborn as sns import seaborn.timeseries def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs): upper = data.max(axis=0) lower = data.min(axis=0) #import pdb; pdb.set_trace() ci = np.asarray((lower, upper)) kwargs.update({"central_data": central_data, "ci": ci, "data": data}) seaborn.timeseries._plot_ci_band(*args, **kwargs) seaborn.timeseries._plot_range_band = _plot_range_band cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount() ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", unit="Unit", data=cluster_overload, err_style="range_band", n_boot=0) 

Output Schedule: enter image description here

Note that the shaded areas line up with the true highs and lows in the line chart!

If you find out why the unit variable is required, let me know.


If you do not want all of them to be on the same chart, follow these steps:

 import pandas as pd import seaborn as sns import seaborn.timeseries def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs): upper = data.max(axis=0) lower = data.min(axis=0) #import pdb; pdb.set_trace() ci = np.asarray((lower, upper)) kwargs.update({"central_data": central_data, "ci": ci, "data": data}) seaborn.timeseries._plot_ci_band(*args, **kwargs) seaborn.timeseries._plot_range_band = _plot_range_band cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['subindex'] = cluster_overload.groupby(['Cluster','Week']).cumcount() def customPlot(*args,**kwargs): df = kwargs.pop('data') pivoted = df.pivot(index='subindex', columns='Week', values='Overload') ax = sns.tsplot(pivoted.values, err_style="range_band", n_boot=0, color=kwargs['color']) g = sns.FacetGrid(cluster_overload, row="Cluster", sharey=False, hue='Cluster', aspect=3) g = g.map_dataframe(customPlot, 'Week', 'Overload','subindex') 

Which gives the following (you can play with aspect ratio if you think proportions are disabled) enter image description here

+3
source

Finally, I used a good old plot with a design (subtitles) that seems to me more readable.

 df = pd.read_csv('TSplot.csv', sep='\t', index_col=0) # Compute the min, mean and max (could also be other values) grouped = df.groupby(["Cluster", "Week"]).agg({'Overload': ['min', 'mean', 'max']}).unstack("Cluster") # Plot with sublot since it is more readable axes = grouped.loc[:,('Overload', 'mean')].plot(subplots=True) # Getting the color palette used palette = sns.color_palette() # Initializing an index to get each cluster and each color index = 0 for ax in axes: ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'mean', index + 1)], grouped.loc[:,('Overload', 'max', index + 1 )], alpha=.2, color=palette[index]) ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'min', index + 1)] , grouped.loc[:,('Overload', 'mean', index + 1)], alpha=.2, color=palette[index]) index +=1 

enter image description here

+4
source

I really thought I could do this with seaborn.tsplot . But this is not entirely correct. Here is the result that I get from the seabed:

 cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount() ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", ci=100, unit="Unit", data=cluster_overload) 

Outputs:

enter image description here

I am really confused about why the unit parameter is necessary, as I understand that all data is aggregated based on (time, condition) Seaborn Documentation defines unit as

A field in a DataFrame that identifies a sample block (e.g., subject, neuron, etc.). The representation of the error will be destroyed units at each observation of the time / condition. This has no role when the data is an array.

I am not sure about the meaning of β€œcollapsing,” especially because my definition will not become a mandatory variable.

In any case, the conclusion here is, if you want exactly what you discussed, it is not so beautiful. I'm not sure how to manually shade in these regions, but please share if you find out.

 cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True) grouped = cluster_overload.groupby(['Cluster','Week'],as_index=False) stats = grouped.agg(['min','mean','max']).unstack().T stats.index = stats.index.droplevel(0) colors = ['b','g','r'] ax = stats.loc['mean'].plot(color=colors, alpha=0.8, linewidth=3) stats.loc['max'].plot(ax=ax,color=colors,legend=False, alpha=0.3) stats.loc['min'].plot(ax=ax,color=colors,legend=False, alpha=0.3) 

Outputs: enter image description here

+1
source

All Articles