ValueError: the value in x_new is below the interpolation range

Question

ValueError: the value in x_new is below the interpolation range

This is the scikit-learn error that I get when I do

my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)

Please note that if I reduce max_n_alphas from 1e5 to 1e4, I no longer get this error.

Anyone have an idea what is going on?

An error occurs when I call

 my_estimator.fit(x, y)

I have 40k data points in 40 dimensions.

A full stack trace looks like this:

  File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit axis=0)(all_alphas) File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__ y = self._evaluate(x) File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate out_of_bounds = self._check_bounds(x_new) File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds raise ValueError("A value in x_new is below the interpolation " ValueError: A value in x_new is below the interpolation range.

+6

python scikit-learn regression lars

Baron yugovich Mar 30 '16 at 22:22

source share

1 answer

Alex I · Answer 1 · 2016-04-04T10:41:18+0000

There must be something special in your data. LassoLarsCV() seems to work correctly with this synthetic example of pretty good data:

 import numpy import sklearn.linear_model # create 40000 x 40 sample data from linear model with a bit of noise npoints = 40000 ndims = 40 numpy.random.seed(1) X = numpy.random.random((npoints, ndims)) w = numpy.random.random(ndims) y = X.dot(w) + numpy.random.random(npoints) * 0.1 clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6) clf.fit(X, y) # coefficients are almost exactly recovered, this prints 0.00377 print max(abs( clf.coef_ - w )) # alphas actually used are 41 or ndims+1 print clf.alphas_.shape

This is in sklearn 0.16, I do not have the option positive=True .

I'm not sure why you still want to use a very large max_n_alphas. Although I don’t know why 1e + 4 and 1e + 5 work in your case, I suspect that the paths you get from max_n_alphas = ndims + 1 and max_n_alphas = 1e + 4, or whatever, would be identical for well-executed data. Also, the optimal alpha, which is evaluated by cross-checking in clf.alpha_ , will be identical. Check out the Lasso Example using LARS for what alpha is trying to do.

Also from the LassoLars documentation

alphas_ array, shape (n_alphas + 1,)
The maximum number of covariances (in absolute value) at each iteration. n_alphas - either max_iter, n_features or the number of nodes in the path with a correlation is greater than alpha, whichever is less.

therefore it makes sense that we end alphas_ with size ndims + 1 (i.e. n_features + 1) above.

PS Tested with sklearn 0.17.1 and positive = True also, also tested with some positive and negative coefficients, the same result: alphas_ ndims + 1 or less.

ValueError: the value in x_new is below the interpolation range

More articles: