How to get the results of numerical selection when constructing a regression in the seabed?

If I use a marine library in Python to build a linear regression result, is there a way to find out the numerical results of the regression? For example, I may need to find out the fitting coefficients or R 2 fitting.

I could restart the same installation using the statsmodels basic interface, but this would seem to be an unnecessary duplicate effort, and in any case I would like to be able to compare the obtained coefficients to be sure that the numerical results are the same what I see in the plot.

+22
source share
2 answers

There is no way to do this.

In my opinion, ask the visualization library to give you the results of statistical modeling in the reverse order. statsmodels , a modeling library, allows you to fit a model and then draw a graph that exactly matches the model you need. If you need this exact match, this order of operations makes more sense to me.

You can say "but the graphs in statsmodels do not have as many aesthetic options as seaborn ." But I think it makes sense - statsmodels is a modeling library that sometimes uses visualization in a modeling service. seaborn is a visualization library that sometimes uses modeling in a visualization service. It's good to specialize and bad to try to do everything.

Fortunately, both seaborn and statsmodels use tidy data . This means that you really need very little duplication of effort to get both graphs and models using the appropriate tools.

+9
source

Looking through the currently available document, the closest I was able to determine if this function can now be executed if you use the scipy.stats.pearsonr module.

 r2 = stats.pearsonr("pct", "rdiff", df) 

When trying to make it work directly in the Pandas framework, an error occurred due to a violation of the basic requirements for the input signal:

 TypeError: pearsonr() takes exactly 2 arguments (3 given) 

I managed to find another Pandas Seaborn user who obviously solved it: https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/stats.py#L2392

 sns.regplot("rdiff", "pct", df, corr_func=stats.pearsonr); 

But, unfortunately, I was not able to get this to work, because, apparently, the author created his own "corr_func", or there is an undocumented method for passing arguments Seaborn, available using a more manual method:

 # x and y should have same length. x = np.asarray(x) y = np.asarray(y) n = len(x) mx = x.mean() my = y.mean() xm, ym = x-mx, y-my r_num = np.add.reduce(xm * ym) r_den = np.sqrt(ss(xm) * ss(ym)) r = r_num / r_den # Presumably, if abs(r) > 1, then it is only some small artifact of floating # point arithmetic. r = max(min(r, 1.0), -1.0) df = n-2 if abs(r) == 1.0: prob = 0.0 else: t_squared = r*r * (df / ((1.0 - r) * (1.0 + r))) prob = betai(0.5*df, 0.5, df / (df + t_squared)) return r, prob 

We hope that this will help move this initial request towards an interim solution, since it really needs a utility to add regression statistics to the Seaborn package as a substitute for what can be easily obtained from MS-Excel or the Matplotlib line plan.

+1
source

All Articles