The fastest way to calculate convolution

Does anyone know of the fastest method for calculating convolution? Unfortunately, the matrix I'm dealing with is very large (500x500x200), and if I use convn in MATLAB, it takes a lot of time (I have to repeat this calculation in a nested loop). So, I used convolution with FFT, and now it is faster. But I'm still looking for a faster method. Any idea?

+8
c ++ matlab signal-processing convolution template-matching
source share
3 answers

If your kernel is separable, the greatest increase in speed will be realized by performing several consecutive one-dimensional convolutions.

MathWorks Steve Eddins describes how to use convolution associativity to speed up convolution when the kernel is separable in the MATLAB context of his blog . For the P-by-Q kernel, the computational advantage of performing two separate and sequential convolutions as compared to the two-dimensional convolution is PQ/(P+Q) , which corresponds to 4.5x for the 9x9 core and ~ 11x for the 15x15 core. EDIT . An interesting involuntary demonstration of this difference was given in this Q&A .

To find out if the kernel is separable (i.e. the external product of two vectors), the blog goes on to describe how to check if your kernel is separable from SVD and how to get 1D kernels. Their example is for a 2D core. For an N-dimensional separable convolution, check this FEX report .


Another resource that deserves attention is this SIMD (SSE3 / SSE4) implementation of Intel 3D convolution , which includes both a source and a presentation . The code is for 16-bit integers. If you don’t upgrade to a GPU (such as cuFFT ), then it will probably be difficult to get faster than Intel implementations, including Intel MKL . There is an example of a 3D convolution (float with one precision) at the bottom of this MKL documentation page (link fixed, now mirrored in https://stackoverflow.com/a/169649/ ).

+15
source share

You can try overlap-add and overlap-save methods. These include breaking your input signal into smaller pieces, and then using any of these methods.

The FFT is most likely - and I'm wrong - the fastest method, especially if you use the built-in routines in MATLAB or the C ++ library. In addition, breaking the input signal into smaller pieces should be a good bet.

+1
source share

I have 2 ways to calculate fastconv

and 2 more than 1

1- armadillo you can use the armadillo library to compute conv with this code

 cx_vec signal(1024,fill::randn); cx_vec code(300,fill::randn); cx_vec ans = conv(signal,code); 

2-use fftw ans sigpack and armadillo library to compute fast conv so you have to initialize the fft of your code in the constructor

 FastConvolution::FastConvolution(cx_vec inpCode) { filterCode = inpCode; fft_w = NULL; } cx_vec FastConvolution::filter(cx_vec inpData) { int length = inpData.size()+filterCode.size(); if((length & (length - 1)) == 0) { } else { length = pow(2 , (int)log2(length) + 1); } if(length != fftCode.size()) initCode(length); static cx_vec zeroPadedData; if(length!= zeroPadedData.size()) { zeroPadedData.resize(length); } zeroPadedData.fill(0); zeroPadedData.subvec(0,inpData.size()-1) = inpData; cx_vec fftSignal = fft_w->fft_cx(zeroPadedData); cx_vec mullAns = fftSignal % fftCode; cx_vec ans = fft_w->ifft_cx(mullAns); return ans.subvec(filterCode.size(),inpData.size()+filterCode.size()-1); } void FastConvolution::initCode(int length) { if(fft_w != NULL) { delete fft_w; } fft_w = new sp::FFTW(length,FFTW_ESTIMATE); cx_vec conjCode(length,fill::zeros); fftCode.resize(length); for(int i = 0; i < filterCode.size();i++) { conjCode.at(i) = filterCode.at(filterCode.size() - i - 1); } conjCode = conj(conjCode); fftCode = fft_w->fft_cx(conjCode); } 
0
source share

All Articles