Implementation of a multidimensional Gaussian probability density function for> 2 dimensions in C ++

I am working on the implementation of the probability density function of a multidimensional Gaussian language in C ++, and I am fixated on how best to handle cases where the dimension is> 2.

A Gaussian PDF file can be written as

multivariate gaussian pdf

where (A) 'or A' is the transposition of the "matrix" created by subtracting the average of all elements x. In this equation, k represents the number of sizes we have, and sigma is the covariance matrix, which is the matrix kx k. Finally, | X | means the determinant of the matrix X.

In the one-dimensional case, the PDF implementation is trivial. Even in the two-dimensional case (k = 2) this is trivial. However, when we go beyond two dimensions, the implementation is much more complicated.

In the two-dimensional case, we would have

bivariate gaussian pdf

where rho is the correlation between x and y, with a correlation equal to

correlation between two random variables X and Y

In this case, I could use Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic> to implement the first equation, or simply calculate everything myself using the second equation, without using the Eigen interface with simplified linear algebra.

My thoughts about trying a multivariate case are likely to begin with the continuation of the above equations into a multidimensional case

multivariate pdf

from

multivariate pdf

My questions:

  • Would it be appropriate / recommended to use boost::multi_array for an n-dimensional array, or should I try using Eigen instead?
  • Should I have separate functions for one-dimensional / two-dimensional cases, or should I just divert it all to the multidimensional case using boost :: multi_array (or a suitable alternative)?
+8
c ++ boost statistics eigen
source share
2 answers

I am a bit out of my element here, but some thoughts:

First, in terms of programming, the answer to stocks is a "profile." That is, first enter the code. Then review your execution to make sure the optimization is worth it. IMHO, it is probably more understandable to use a matrix library so that it is closer to the original mathematics.

From a mathematical view: I have little doubt about the formula you provide for the multidimensional case. It doesn't suit me. The expression Z should be a quadratic form, but your Z should not. If I didn’t miss something.

Here is an option that you did not mention, but may make sense. Especially if you are going to evaluate PDF several times for one distribution. Start by calculating the basics of the main components of your distribution. That is, a proper basis for Σ. The main directions of the components are orthogonal. The core components of cross-covariance are all 0, so PDF has a simple form. When you want to evaluate, change the input framework to the main component of the principle, and then perform a simpler PDF calculation.

The idea is that you can calculate the change in the base matrix and the main components after the start, and then you only need to perform one matrix multiplication (base change) by the estimate instead of the two matrix multiplications needed to estimate (x-μ)' Σ (x-μ) in the standard basis.

+1
source share

I basically implemented the exp part of the equation for the three-dimensional case in this question . At first I used a computer vision library called OpenCV . But I noticed that the C ++ interface was very slow. Subsequently, I tried the C interface, which was a little faster. Finally, I decided to ignore flexibility and readability, so I implemented it without any libraries, and it was faster.

What I'm trying to say is this: when performance is important, you should consider applying special cases to the most commonly used number of measurements with minimal overhead. Otherwise, select speed support.

Disclaimer: I don’t know anything about Eigen or boost::multi_array (which is probably the question really aimed at?).

0
source share

All Articles