Oh, thatโs easy.
Divide the song into pieces, run the FFT on each, extract a few basic values โโand save them as a hash with time information.
Then do the same with the recorded sound and combine with the stored data regarding the time.
Simple, right? Honestly, this is more complicated, but the idea is similar.
source share