Does scipy linregress function return a standard error by mistake?

I have a strange situation with scipy.stats.linregress, it seems to return the wrong standard error:

from scipy import stats x = [5.05, 6.75, 3.21, 2.66] y = [1.65, 26.5, -5.93, 7.96] gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y) >>> gradient 5.3935773611970186 >>> intercept -16.281127993087829 >>> r_value 0.72443514211849758 >>> r_value**2 0.52480627513624778 >>> std_err 3.6290901222878866 

While Excel returns the following:

  slope: 5.394 intercept: -16.281 rsq: 0.525 steyX: 11.696 

steyX is a standard excel error function returning 11.696 against scipy 3.63. Does anyone know what is going on here? Any alternative way to get a standard regression error in python without switching to Rpy?

+7
python scipy regression
source share
5 answers

You can try the statsmodels package:

 In [37]: import statsmodels.api as sm In [38]: x = [5.05, 6.75, 3.21, 2.66] In [39]: y = [1.65, 26.5, -5.93, 7.96] In [40]: X = sm.add_constant(x) # intercept In [41]: model = sm.OLS(y, X) In [42]: fit = model.fit() In [43]: fit.params Out[43]: array([ 5.39357736, -16.28112799]) In [44]: fit.rsquared Out[44]: 0.52480627513624789 In [45]: np.sqrt(fit.mse_resid) Out[45]: 11.696414461570097 
+6
source share

I just informed the SciPy user group that std_err here represents the standard error of the gradient line, and not the standard error of the predicted y, as in Excel. Nevertheless, users of this function should be careful, because this was not always the behavior of this library - it was used for output in the same way as in Excel, and the transition seems to have occurred over the past few months.

Anyway, we're looking for the STEYX equivalent in Python.

+8
source share

yes, that's true - a standard gradient estimate is what linregress returns; the standard grade estimate (Y) is bound, however, and you can return to SEE by multiplying the standard gradient error (SEG) that linregress gives: SEG = SEE / sqrt (sum (X - average X) ** 2)

Stack Exchange does not process latex, but the math is here , if you're interested, in the Sample Data Analysis section.

+2
source share

The calculation of "std err on y" in Excel is actually the standard deviation of y values.

This is the same for std err on x. The number "2" at the last stage is the degree of freedom of the example you gave.

 >>> x = [5.05, 6.75, 3.21, 2.66] >>> y = [1.65, 26.5, -5.93, 7.96] >>> def power(a): return a*5.3936-16.2811 >>> y_fit = list(map(power,x)) >>> y_fit [10.956580000000002, 20.125700000000005, 1.032356, -1.934123999999997] >>> var = [y[i]-y_fit[i] for i in range(len(y))] >>> def pow2(a): return a**2 >>> summa = list(map(pow2,var)) >>> summa [86.61243129640003, 40.63170048999993, 48.47440107073599, 97.89368972737596] >>> total = 0 >>> for i in summa: total += i >>> total 273.6122225845119 >>> import math >>> math.sqrt(total/2) 11.696414463084658 
0
source share

This will give you the STEYX equivalent using Python:

 fit = np.polyfit(x,y,deg=1) n = len(x) m = fit[0] c = fit[1] y_pred = m*x+c STEYX = (((y-y_pred)**2).sum()/(n-2))**0.5 print(STEYX) 
0
source share

All Articles