1D Fast Convolution without FFT

I need 1D convolution against 2 large arrays. I use this code in C #, but it takes loooong time to run.

I know, I know! FFT convolutions are very fast. But in this project I can’t use it. This is a limitation of the project not to use FFT (please don't ask why: /).

This is my C # code (ported using matlab, by the way):

var result = new double[input.Length + filter.Length - 1]; for (var i = 0; i < input.Length; i++) { for (var j = 0; j < filter.Length; j++) { result[i + j] += input[i] * filter[j]; } } 

So, does anyone know which fast convolution algorithm has FFT width?

+5
source share
4 answers

Convolution numerically coincides with polynomial multiplication with an additional bypass step. Therefore, to perform the convolution, you can use all algorithms of the polynomial and large integer multiplication.

FFT is the only way to get fast O (n log (n)) time. But you can still get sub-squared runtime using split-and-win approaches, such as the Karatsuba algorithm .

The Karatsuba algorithm is pretty easy to implement as soon as you understand how it works. It works in O (n ^ 1.585) and is likely to be faster than trying to optimize the classic O (n ^ 2) approach.

+5
source

You can reduce the number of indexed calls to result , as well as the Length property:

 int inputLength = filter.Length; int filterLength = filter.Length; var result = new double[inputLength + filterLength - 1]; for (int i = resultLength; i >= 0; i--) { double sum = 0; // max(i - input.Length + 1,0) int n1 = i < inputLength ? 0 : i - inputLength + 1; // min(i, filter.Length - 1) int n2 = i < filterLength ? i : filterLength - 1; for (int j = n1; j <= n2; j++) { sum += input[i - j] * filter[j]; } result[i] = sum; } 

If you split the outer loop, you can get rid of some repetitive conventions. (This takes 0 < filterLength & leq; inputLength & leq; resultLength )

 int inputLength = filter.Length; int filterLength = filter.Length; int resultLength = inputLength + filterLength - 1; var result = new double[resultLength]; for (int i = 0; i < filterLength; i++) { double sum = 0; for (int j = i; j >= 0; j--) { sum += input[i - j] * filter[j]; } result[i] = sum; } for (int i = filterLength; i < inputLength; i++) { double sum = 0; for (int j = filterLength - 1; j >= 0; j--) { sum += input[i - j] * filter[j]; } result[i] = sum; } for (int i = inputLength; i < resultLength; i++) { double sum = 0; for (int j = i - inputLength + 1; j < filterLength; j++) { sum += input[i - j] * filter[j]; } result[i] = sum; } 
+4
source

Here are two possibilities that small accelerations can give, but you will need to check to be sure.

  • Expand the inner loop to remove some tests. This will be easier if you know that the filter length will always be a multiple if some N.
  • Change the order of the loops. Executes filter.length over the entire array. This makes fewer dereferences in the inner loop, but may have worse caching behavior.
0
source

You can use a special IIR filter. Then treat it like:

 y(n)= a1*y(n-1)+b1*y(n-2)...+a2*x(n-1)+b2*x(n-2)...... 

I think it is faster.

0
source

All Articles