IPhone speeds up FFT vs Matlab FFT

I don’t have a lot of mathematical background, but part of the project I'm working on requires an FFT of one vector. The matlab fft (x) function works exactly for what I need, but when I try to configure the FLE acceleration functions of the Framework, I get completely inaccurate results. If anyone has more experience with Accelerate Framework fft, I could really use some help trying to figure out what I'm doing wrong. I based the fft setup on an example I found on google, but there were no tutorials or anything that gave different results.

EDIT1: Changed around some things based on answers so far. It looks like it is doing the calculations, but does not in any way bring them close to that of matlab

This is the documentation for fft for matlab: http://www.mathworks.com/help/techdoc/ref/fft.html

** NOTE: for example, for purposes, the x array will be {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16} in both examples

Matlab Code:

x = fft(x) 

Matlab Output:

 x = 1.0e+02 * Columns 1 through 4 1.3600 -0.0800 + 0.4022i -0.0800 + 0.1931i -0.0800 + 0.1197i Columns 5 through 8 -0.0800 + 0.0800i -0.0800 + 0.0535i -0.0800 + 0.0331i -0.0800 + 0.0159i Columns 9 through 12 -0.0800 -0.0800 - 0.0159i -0.0800 - 0.0331i -0.0800 - 0.0535i Columns 13 through 16 -0.0800 - 0.0800i -0.0800 - 0.1197i -0.0800 - 0.1931i -0.0800 - 0.4022i 

Apple Accelerate Framework: http://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html#//apple_ref/doc/uid/TP40009464

Objective-C Code:

 int log2n = log2f(16); FFTSetupD fftSetup = vDSP_create_fftsetupD (log2n, kFFTRadix2); DSPDoubleSplitComplex fft_data; fft_data.realp = (double *)malloc(8 * sizeof(double)); fft_data.imagp = (double *)malloc(8 * sizeof(double)); vDSP_ctoz((COMPLEX *) ffx, 2, &fft_data, 1, nOver2); //split data (1- 16) into odds and evens vDSP_fft_zrip (fftSetup, &fft_data, 1, log2n, kFFTDirection_Forward); //fft forward vDSP_fft_zrip (fftSetup, &fft_data, 1, log2n, kFFTDirection_Inverse); //fft inverse vDSP_ztoc(&fft_data, 2, (COMPLEX *) ffx, 1, nOver2); //combine complex back into real numbers 

Objective C output:

Now ffx contains:

 272.000000 -16.000000 -16.000000 -16.000000 0.000000 0.000000 0.000000 0.000000 0.000000 10.000000 11.000000 12.000000 13.000000 14.000000 15.000000 16.000000 
+7
source share
3 answers

One big problem: C arrays are indexed from 0, unlike MATLAB arrays that are based on 1. So you need to change your loop from

 for(int i = 1; i <= 16; i++) 

to

 for(int i = 0; i < 16; i++) 

The second, big problem - you mix routines with one precision ( float ) and double precision ( double ). Your data is double , so you should use vDSP_ctozD , not vDSP_ctoz and vDSP_fft_zripD , not vDSP_fft_zrip , etc.

Another thing to note: different FFT implementations use different definitions of the DFT formula, especially with respect to the scaling factor. MATLAB FFT seems to include 1 / N scaling correction, which most other FFTs don't use.


Here is a complete working example whose output matches Octave (MATLAB clone):

 #include <stdio.h> #include <stdlib.h> #include <Accelerate/Accelerate.h> int main(void) { const int log2n = 4; const int n = 1 << log2n; const int nOver2 = n / 2; FFTSetupD fftSetup = vDSP_create_fftsetupD (log2n, kFFTRadix2); double *input; DSPDoubleSplitComplex fft_data; int i; input = malloc(n * sizeof(double)); fft_data.realp = malloc(nOver2 * sizeof(double)); fft_data.imagp = malloc(nOver2 * sizeof(double)); for (i = 0; i < n; ++i) { input[i] = (double)(i + 1); } printf("Input\n"); for (i = 0; i < n; ++i) { printf("%d: %8g\n", i, input[i]); } vDSP_ctozD((DSPDoubleComplex *)input, 2, &fft_data, 1, nOver2); printf("FFT Input\n"); for (i = 0; i < nOver2; ++i) { printf("%d: %8g%8g\n", i, fft_data.realp[i], fft_data.imagp[i]); } vDSP_fft_zripD (fftSetup, &fft_data, 1, log2n, kFFTDirection_Forward); printf("FFT output\n"); for (i = 0; i < nOver2; ++i) { printf("%d: %8g%8g\n", i, fft_data.realp[i], fft_data.imagp[i]); } for (i = 0; i < nOver2; ++i) { fft_data.realp[i] *= 0.5; fft_data.imagp[i] *= 0.5; } printf("Scaled FFT output\n"); for (i = 0; i < nOver2; ++i) { printf("%d: %8g%8g\n", i, fft_data.realp[i], fft_data.imagp[i]); } printf("Unpacked output\n"); printf("%d: %8g%8g\n", 0, fft_data.realp[0], 0.0); // DC for (i = 1; i < nOver2; ++i) { printf("%d: %8g%8g\n", i, fft_data.realp[i], fft_data.imagp[i]); } printf("%d: %8g%8g\n", nOver2, fft_data.imagp[0], 0.0); // Nyquist return 0; } 

Exit:

 Input 0: 1 1: 2 2: 3 3: 4 4: 5 5: 6 6: 7 7: 8 8: 9 9: 10 10: 11 11: 12 12: 13 13: 14 14: 15 15: 16 FFT Input 0: 1 2 1: 3 4 2: 5 6 3: 7 8 4: 9 10 5: 11 12 6: 13 14 7: 15 16 FFT output 0: 272 -16 1: -16 80.4374 2: -16 38.6274 3: -16 23.9457 4: -16 16 5: -16 10.6909 6: -16 6.62742 7: -16 3.1826 Scaled FFT output 0: 136 -8 1: -8 40.2187 2: -8 19.3137 3: -8 11.9728 4: -8 8 5: -8 5.34543 6: -8 3.31371 7: -8 1.5913 Unpacked output 0: 136 0 1: -8 40.2187 2: -8 19.3137 3: -8 11.9728 4: -8 8 5: -8 5.34543 6: -8 3.31371 7: -8 1.5913 8: -8 0 

Comparing with Octave, we get:

 octave-3.4.0:15> x = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ] x = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 octave-3.4.0:16> fft(x) ans = Columns 1 through 7: 136.0000 + 0.0000i -8.0000 + 40.2187i -8.0000 + 19.3137i -8.0000 + 11.9728i -8.0000 + 8.0000i -8.0000 + 5.3454i -8.0000 + 3.3137i Columns 8 through 14: -8.0000 + 1.5913i -8.0000 + 0.0000i -8.0000 - 1.5913i -8.0000 - 3.3137i -8.0000 - 5.3454i -8.0000 - 8.0000i -8.0000 - 11.9728i Columns 15 and 16: -8.0000 - 19.3137i -8.0000 - 40.2187i octave-3.4.0:17> 

Note that outputs 9 through 16 are simply a complex conjugate mirror image or the bottom 8 terms, as is the expected case with FFT with real input.

Please also note that we needed to scale the VFT filter twice - this is due to the fact that it is a complex FFT, which is based on an N / 2-point complex-complex FFT, so the outputs are scaled to N / 2 , while the normal FFT will scale with N.

+14
source

I think this may be a problem with array packing. I just looked at their sample code and I see that they continue to call conversion procedures, for example

 vDSP_ctoz 

Here is an example sample code from apple: http://developer.apple.com/library/ios/#documentation/Performance/Conceptual/vDSP_Programming_Guide/SampleCode/SampleCode.html#//apple_ref/doc/uid/TP40005147-CH205-CIAEJIGFF

I do not think this is the complete answer, but I also agree with Paul R.

By the way, just curious if you go to Wolfram Alpha, they will give a completely different answer for FFT {1,2,3,4,5,6,7,8,9,10,11,12,13, 14,15, sixteen}

+4
source

In MATLAB, it looks like you are making fft of 16 real values ​​{1 + 0i, 2 + 0i, 3 + 0i, etc.}, while in Accelerate you are making fft of 8 complex values ​​{1 + 2i, 3 + 4i , 5 + 6i, etc.}}

0
source

All Articles