Numpy linalg.lstsq with large values

I use linalg.lstsq to build a regression string inside a function like this:

def lsreg(x, y):
    if not isinstance(x, np.ndarray):
        x = np.array(x)
    if not isinstance(y, np.ndarray):
        y = np.array(y)
    A = np.array([x, np.ones(len(x))])
    ret = np.linalg.lstsq(A.T, y)
    return ret[0]

and calling it as follows:

x = np.array([10000001, 10000002, 10000003])
y = np.array([3.0, 4.0, 5.0])
regress = lsreg(x, y)
fit = regress[0]*x + regress[1]
print fit

and the output y get is:

[ 3.  4.  5.]

So far so good. Now, if I changed x as follows:

x = np.array([100000001, 100000002, 100000003])
y = np.array([3.0, 4.0, 5.0])
regress = lsreg(x, y)
fit = regress[0]*x + regress[1]
print fit

I get

[ 3.99999997  4.00000001  4.00000005]

instead of being close to 3, 4, and 5.

Any clue on what's going on?

+4
source share
2 answers

Your problem is related to numerical errors that arise when solving a poorly conditioned system of equations.

In [115]: np.linalg.lstsq(A.T, y)
Out[115]: 
(array([  3.99999993e-08,   3.99999985e-16]),
 array([], dtype=float64),
 1,
 array([  1.73205084e+08,   1.41421352e-08]))

, np.linalg.lstsq "1" AA.T, . , , 1 , , ​​( 2 x 2, 2). , 0. "" . , google " ".

+2

scipy:

from scipy import stats

x = np.array([100000001, 100000002, 100000003])
y = np.array([3.0, 4.0, 5.0])

res = stats.linregress(x, y)
print x*res[0] + res[1]

:

[ 3.  4.  5.]
0

All Articles