Are you sure you want to use FFT? It will be a conversion of whole arrays, which will be expensive. If you have already decided to use the 9x9 convolution filter, you do not need FFT.
Typically, the cheapest way to perform convolution in C is to create a loop that moves the pointer to the array, summing the collapsed values at each point and writing the data to a new array. Then this loop can be parallelized using your favorite method (compiler vectorization, MPI library, OpenMP, etc.).
Regarding the borders:
- If you accept values equal to 0 outside the borders, add a 4-element border 0 to your 2nd array of points. This avoids the need for `if` statements to handle borders that are expensive.
- (.. ), 4- , (abcdefg → fgabcdefgab 2- ). ** : , , FFT **. , , .
4 , 9x9 4 . , n 2n + 1 x 2n + 1.
, , / , , , , . GPU, , , ( ).