Truncated multidimensional normal in SciPy?

Question

Truncated multidimensional normal in SciPy?

I am trying to automate a process that at some point needs to draw selections from a truncated multidimensional normal. That is, this is a normal multidimensional normal distribution (i.e., Gaussian), but the variables are bounded by a cuboid. My data is the average and covariance of the full multidimensional normal, but I need samples in my box.

So far, I have simply rejected samples outside the box and resampled as necessary, but I begin to find that my process sometimes gives me (a) large covariances and (b) means that they are close to the edges. These two events plot against the speed of my system.

So what I would like to do is choose the right distribution. Googling led only to this discussion or truncnormdistribution in scipy.stats. The former is unconvincing, and the latter seems to refer to a single variable. Is there any native multidimensional truncated normal? And will it be better than rejecting the samples, or should I do something smarter?

I will start working on my own solution, which would be to rotate the unused Gaussian main axes to them (with SVD decomposition or something else), use the product of truncated Gaussians to sample the distribution, and then rotate this pattern back, and if necessary, reject / recount. If a truncated sample is more efficient, I think this should speed up the selection of the desired distribution.

+4

python scipy

Warrick Nov 21 '13 at 8:33

source share

1 answer

Warrick · Accepted Answer · 2013-11-25T09:14:33+0000

So, according to the Wikipedia article , sampling for multidimensional truncated normal distribution (MTND) is more difficult. I ended up making a relatively simple way out and using the MCMC sampler to weaken the initial assumption regarding MTND as follows.

I used emcee to get the MCMC work done. I find this package phenomenally easy to use. It only requires a function that returns the log probability of the desired distribution. So I defined this function

from numpy.linalg import inv

def lnprob_trunc_norm(x, mean, bounds, C):
    if np.any(x < bounds[:,0]) or np.any(x > bounds[:,1]):
        return -np.inf
    else:
        return -0.5*(x-mean).dot(inv(C)).dot(x-mean)

Here Cis the covariance matrix of the multidimensional normal. Then you can run something like

S = emcee.EnsembleSampler(Nwalkers, Ndim, lnprob_trunc_norm, args = (mean, bounds, C))

pos, prob, state = S.run_mcmc(pos, Nsteps)

mean, bounds C. pos, ,

pos = emcee.utils.sample_ball(mean, np.sqrt(np.diag(C)), size=Nwalkers)

,

pos = numpy.random.multivariate_normal(mean, C, size=Nwalkers)

.. , , , MCMC.

.

, emcee , threads=Nthreads EnsembleSampler. , .

Truncated multidimensional normal in SciPy?

More articles: