Best way to write a Python function that combines Gaussian?

Question

Best way to write a Python function that combines Gaussian?

When trying to use the scipy quad method to integrate Gaussian (for example, there is a Gaussian method called gauss), I had problems with passing the necessary parameters to gauss and displaying a quad for integration with the correct variable. Does anyone have a good example of how to use quad w / multidimensional function?

But this led me to a more impressive question about the best way to integrate Gaussian in general. I did not find the Gaussian integration in scipy (to my surprise). My plan was to write a simple Gaussian function and pass it into a square (or maybe now a fixed-width integrator). What would you do?

Edit: fixed width means something like trapz that uses fixed dx to calculate the areas under the curve.

What I got to is the make___gauss method, which returns a lambda function that can then go into quad. That way I can make a normal function with the mean and variance that I need before integrating.

def make_gauss(N, sigma, mu): return (lambda x: N/(sigma * (2*numpy.pi)**.5) * numpy.e ** (-(x-mu)**2/(2 * sigma**2))) quad(make_gauss(N=10, sigma=2, mu=0), -inf, inf)

When I tried to pass a common Gaussian function (which needs to be called using x, N, mu and sigma) and fill in some values with quad, for example

 quad(gen_gauss, -inf, inf, (10,2,0))

parameters 10, 2 and 0 did NOT necessarily correspond to N = 10, sigma = 2, mu = 0, which caused a more extended definition.

erf (z) in scipy.special would require me to pinpoint what t is originally, but it's nice to know that it is.

+6

python scipy gaussian integral

physicsmichael Feb 04 '09 at 3:49

source share

5 answers

scipy ships with an "error function", or a Gaussian integral:

 import scipy.special help(scipy.special.erf)

+12

Mr Fooz Feb 04 '09 at 4:11

source share

I assume that you are processing multidimensional Gaussians; if so, SciPy already has the function you are looking for: it is called MVNDIST ("MultiVariate Normal DISTribution"). The SciPy documentation is more awful than ever, so I can’t even find where the function is located, but somewhere somewhere . The documentation is easily the worst part of SciPy and in the past has disappointed me to the end.

Gaussian with one variable uses only the good old error function, from which many implementations are available.

As for attacking the problem in general, yes, as James Thompson mentions, you just want to write your own gaussian distribution function and pass it to quad (). If you can avoid generalized integration, however, it is a good idea - special integration methods for a specific function (e.g. using MVNDIST) will be much faster than standard Monte Carlo integration, which can be extremely slow for high accuracy.

+3

kquinn Feb 04 '09 at 4:10

source share

The Gaussian distribution is also called the normal distribution. The cdf function in the scipy norm module does what you want.

 from scipy.stats import norm print norm.cdf(0.0) >>>0.5

http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm

+3

Chuck Jan 18 '11 at 13:49

source share

Why not just do your integration from -infinity to + infinity so you always know the answer? (Just kidding!)

My guess is that the only reason SciPy does not have a canned Gaussian function is the trivial write function. Your suggestion about writing your own function and putting it in a square to integrate sounds is fine. It uses the accepted SciPy tool for this, it is a minimal code effort for you, and it is very readable for other people, even if they have never seen SciPy.

What exactly do you mean by a fixed-width integrator? Do you mean using a different algorithm than using QUADPACK?

Edit: For completeness, here is something like what I would like to try for a Gaussian with an average value of 0 and a standard deviation of 1 from 0 to + infinity:

 from scipy.integrate import quad from math import pi, exp mean = 0 sd = 1 quad(lambda x: 1 / ( sd * ( 2 * pi ) ** 0.5 ) * exp( x ** 2 / (-2 * sd ** 2) ), 0, inf )

This is a little ugly because the Gauss function is a bit long, but still pretty trivial to write.

+2

James thompson Feb 04 '09 at 3:59

source share

kquinn · Accepted Answer · 2009-02-04T12:32:51+0000

Well, you seem pretty confused about a few things. Let's start from the beginning: you mentioned the "multidimensional function", but then discuss the usual one-parameter Gaussian curve. This is not a multidimensional function: when you integrate it, you combine only one variable (x). It’s important to make a distinction because there is a monster called the “multidimensional Gaussian distribution” that is a true multidimensional function and, if integrated, requires integration over two or more variables (which uses the expensive Monte Carlo technique that I mentioned earlier) . But you seem to be just talking about a regular single-variable Gaussian, which is much easier to work with, integrate and all that.

The one-parameter Gaussian distribution has two parameters: sigma and mu and is a function of one variable, which we will denote x . You also carry the normalization parameter n (which is useful in several applications). Normalization parameters are usually not included in the calculations, since you can simply peel them back at the end (remember that integration is a linear operator: int(n*f(x), x) = n*int(f(x), x) ) But we can wear it if you want; the designation that I like for normal distribution, then

N(x | mu, sigma, n) := (n/(sigma*sqrt(2*pi))) * exp((-(x-mu)^2)/(2*sigma^2))

(read that as "the normal distribution of x given by sigma , mu , and n given ...") So far, so good; this corresponds to the function you received. Note that the only true variable here is x : the other three parameters are fixed for any particular Gaussian.

Now for the mathematical fact: it is proved that all Gauss curves have the same shape, they are slightly shifted. Therefore, we can work with N(x|0,1,1) , called the "standard normal distribution", and simply transfer our results back to a common Gaussian curve. So, if you have an integral of N(x|0,1,1) , you can trivially calculate the integral of any Gaussian. This integral arises so often that it has a special name: the erf error function. Due to some old conventions, this is not exactly erf ; there are a couple of additive and multiplicative factors that are also tolerated.

If Phi(z) = integral(N(x|0,1,1), -inf, z) ; those. Phi(z) is the integral of the standard normal distribution from minus infinity to z , then it is true by the definition of the error function, which

Phi(z) = 0.5 + 0.5 * erf(z / sqrt(2)) .

Similarly, if Phi(z | mu, sigma, n) = integral( N(x|sigma, mu, n), -inf, z) ; those. Phi(z | mu, sigma, n) is an integral from the normal distribution of the given parameters mu , sigma and n from minus infinity to z , then this is true by the definition of the error function, which

Phi(z | mu, sigma, n) = (n/2) * (1 + erf((x - mu) / (sigma * sqrt(2)))) .

Check out the Wikipedia article on regular CDF if you want more information or evidence of this fact.

Well, that should be enough background. Return to your (edited) post. You say: "erf (z) in scipy.special would require me to pinpoint exactly what t is originally." I have no idea what you mean by that; where t (time?) generally comes into this? Hopefully the above explanation has slightly changed the error function, and now it’s clear why the error function is the right function for the job.

Your Python code is fine, but I would prefer closure over lambda:

 def make_gauss(N, sigma, mu): k = N / (sigma * math.sqrt(2*math.pi)) s = -1.0 / (2 * sigma * sigma) def f(x): return k * math.exp(s * (x - mu)*(x - mu)) return f

Using closure allows for preliminary calculation of the constants k and s , so the returned function will need to do less work each time it is called (which may be important if you integrate it, which means that it will be many times). In addition, I avoided using the exposure operator ** , which is slower than just writing the square, and pulled the delimiter from the inner loop and replaced it with multiplication. I did not look at their implementation in Python, but from my last time, setting the inner loop to pure speed using the raw build x87, I seem to remember that adding, subtracting or multiplying takes about 4 processor cycles each, divides 36. and raising to a power of about 200. That was a couple of years ago, so take these numbers with salt; However, this illustrates their relative complexity. Also, computing the exp(x) brute force method is a very bad idea; there are tricks you can take when writing a good exp(x) implementation that make it much faster and more accurate than wrapping a**b style.

I have never used the numpy version of the constants pi and e; I have always followed simple versions of a math module. I do not know why you may prefer one of them.

I'm not sure what you are going to call quad() . quad(gen_gauss, -inf, inf, (10,2,0)) should integrate the renormalized Gaussian from minus infinity to plus infinity and should always spit out 10 (your normalization coefficient), since Gaussian integrates into 1 over the real line. Any answer away from 10 (I would not expect exactly 10, since quad() is just an approximation, after all) means that something is screwed somewhere ... it's hard to say what was introduced without knowing the actual the return value and possibly the inner workings of quad() .

Hope this demystified some confusion and explained why the error function is the correct answer to your problem, and also how to do it all yourself if you are interested. If any of my explanations was not clear, I suggest first looking at Wikipedia; If you have any questions, feel free to ask.

Best way to write a Python function that combines Gaussian?

More articles: