Gaussian process (scikit-learn) Prediction Trust Interval Oddities

Question

Gaussian process (scikit-learn) Prediction Trust Interval Oddities

I am involved in the analysis of particle physics and was hoping that someone out there could give me some idea of the Gaussian-Process correction, which I am trying to use to extrapolate some data.

I have uncertainty data that I feed to the GaussianProcess algorithm learning scikit. I include non-continents through the argument “nugget” (my implementation matches the standard example here , where my “corr” is exponential square, and the “Nugget” values are set to (dy / y) ** 2). The main problem is this: I have low absolute uncertainty (but high fractional uncertainty) at the edges of the distribution, and this leads to a predicted confidence interval that is much larger than what I expect in this area (see the chart below).

The reason that the uncertainties behave this way is because I am dealing with particle physics data, which is a histogram of particle counts observed with different values (x). These calculations follow the Poisson distribution and therefore have the uncertainty (standard deviation) sqrt (N). Thus, higher distribution reference areas have higher absolute, but less fractional uncertainty, and vice versa for lower calculation areas.

I understand, as I said, that the argument "nugget" in this function must have values (fractional uncertainty) ** 2 when working with a square exponential kernel. Therefore, it makes sense that if the predicted uncertainty is based on fractional input uncertainty, which can be large at the edges. But I do not quite understand how this happens in mathematics, and the size of the predicted SO uncertainty is much larger than the data uncertainty at the edges, which seems wrong to me.

Can anyone comment on what is happening here? Is this behavior as expected? If so, why? Any thoughts or links to further reading on this subject would be greatly appreciated!

I will leave you with a few important caveats:

1) there are several data points with zero samples at the edges of the distribution. This gives rise to a kink in the fractional uncertainty for the "nugget", because (sqrt (0) / 0) ** 2 is not very happy. I made an adjustment here by simply setting the nugget value for these points to 1.0, which corresponds to the value you get if this is score 1. I believe this is a normal approximation that affects the question, but I don’t understand. I think it will fundamentally change a problem.

2) The data I'm working with is actually the 2nd histogram (i.e. one independent variable (say x), another (y) and counters as a dependent variable (z)). The graph shown is the 1st fragment of 2d data and prediction (i.e., Z vs x integrated in a small range of y). I don’t think it really addresses the issue, but I thought I mentioned it.

+6

python scikit-learn uncertainty confidence-interval

danjump Dec 22 '15 at 14:49

source share

1 answer

Andreus · Answer 1 · 2016-04-13T16:27:48+0000

From your presentation, I suspect that the behavior is correct, although I did not go through math. My instinct tells me: do not make a single histogram. Increase the size of the hopper when moving from the center of distribution. This will increase your values and reduce your fractional errors.

Gaussian process (scikit-learn) Prediction Trust Interval Oddities

More articles: