How many FFT per second can I do on my smartphone? (to perform voice recognition)

I study voice recognition and DSP, so I would like to implement a simple sound frequency analyzer on my smartphone (I have both an iPhone and Samsung Nexus S with Android). I used to do basic DSP in Matlab.

In my opinion, I need to perform FFT in order to get the main signal frequencies.

So, now I would like to try a microphone with a frequency of 44100 Hz. If I use a sliding window with a sample size of 512 with an overlap of 50%, this means that I need to perform an FFT every 256 samples or 0.00580 seconds.

This figure seems really high , especially if I program in Java for Android. Will my smartphones be able to handle this speed? I know that you can program in C / C ++ on Android, but I would like to keep it with Java for now.

+8
android iphone fft audio signal-processing
source share
4 answers

Performing a real-complex FFT requires ~ 5/2 n lg n floating point operations (additions and multiplications). In your case, n = 512, therefore:

flops per fft ~= (5/2) * 512 * 9 = 11520 

Thus, 172 feet per second require about 2 million floating point operations per second. It sounds a lot, but in reality it is not so much. The hardware of a typical armv7-class smartphone is capable of performing hundreds of millions or billions of floating point operations per second.

Note that you want to have a carefully written high-performance FFT; poorly written FFTs are notoriously inefficient. On iPhone, you can use the Accelerate infrastructure (built right into the OS and available in the SDK), which provides a good set of FFT features; I'm not sure what is available on Android.

+9
source share

For iPhone, the Accelerate for iOS framework can execute all FFTs that you specify using about 1% of the processor time (exact percentage depending on the device model and FFT data types).

For Android, you might want to use the NDK native library for intense computational computing with processors.

Also note that FFT will give you peak frequencies, which will not necessarily include the pitch or voice frequency.

ADDED: This Java-centric web page assumes that Android phones are capable of ranging from 5 to over 50 MFlops using Java for well-written matrix math. A well-written FFT should fall in about the same performance range in MFlops. @Stephan Cannon reported that your specification may require 2 MFlops.

+5
source share

Your Android device will be able to handle this penalty. I wrote real-time FFT-based frequency analyzers that ran on Windows Mobile devices several years ago (using pure C #), and these devices had much worse processors than modern Android devices. The most computationally expensive aspect of FFT is trigger functions, and since you use a fixed-size window, you can easily replace trigger function calls with a pre-calculated lookup table.

+3
source share

Alternatively, you can reduce the computation time by lowering the sampling rate. Speech does not have much energy above 8 kHz, so you can most likely reduce your audio signal to 16 kHz before doing any FFTs, without losing accuracy. At 16 kHz, your FFTs will be smaller and faster.

Wikipedia claims that 16 kHz is the standard sampling rate for speech recognition in desktop applications.

(I understand that this does not answer the OP question, but I think that this could help him, nevertheless, given his statement.)

+1
source share

All Articles