I am trying to extract pitch data from an audio stream. From what I see, it seems that FFT is the best algorithm to use.
Instead of delving right into the math, can someone help me figure out what this FFT algorithm does?
Please do not say anything obvious like "FFT extracts frequency data from a raw signal." I need the next level of detail.
Why should I go, and what should I do?
Once I understand the interface explicitly, this will help me understand the implementation.
I believe that I need to transfer an audio buffer, I need to say how many bytes to use for each calculation (say, the last 1024 bytes from this buffer). and maybe I need to specify the range of steps that I want to detect. Now is he going to convey something? Lots of frequency bins? What is it?
(Edit :) I found a C ++ algorithm to use (if I can only figure it out)
Performous extracts the pitch from the microphone. Also, the code is open source. Here is a description of what the algorithm does, from the guy who encoded it.
- PCM input (buffered)
- FFT (1024 samples at a time, then removes 200 samples from the front of the buffer)
- The reassignment method (versus the previous FFT, which was previously 200 samples)
- Peak filtering (this part can be done much better or even eliminated).
- The combination of peaks with harmonic sets (we call the combination of timbre)
- Temporal filtering of tones (update the set of previously detected tones instead of simply using just discovered tones)
- Choose the best vocal tone (frequency limits, weighting, you can also use a harmonic array, but I donβt think we do it)
But can anyone help me understand how this works? What is sent from the FFT to the reassignment method?
source share