Goal C - Cross-correlation to estimate sound delay

I would like to know if anyone knows how to perform cross-correlation between two audio signals on iOS .

I would like to align the FFT windows that I receive in the receiver (I receive the signal from the microphone) with those in the transmitter (which plays the soundtrack), i.e. make sure that the first sample of each window (except for the “synchronization” period) in the transmitter will also be the first window in the receiver.

I put a known waveform (in the frequency domain) into each piece of the transmitted audio. I want to estimate the delay through cross-correlation between the known waveform and the received signal (over several consecutive pieces), but I do not know how to do this.

There seems to be a vDSP_convD method for this, but I have no idea how to use it and whether I must first run the real FFT of the samples (probably yes, because I need to pass double []).

 void vDSP_convD ( const double __vDSP_signal[], vDSP_Stride __vDSP_signalStride, const double __vDSP_filter[], vDSP_Stride __vDSP_strideFilter, double __vDSP_result[], vDSP_Stride __vDSP_strideResult, vDSP_Length __vDSP_lenResult, vDSP_Length __vDSP_lenFilter ) 
+7
source share
2 answers

The vDSP_convD() function computes the convolution of two input vectors to create a result vector. Its unlikely that you want to curl up in the frequency domain, since you are looking for a result in the time domain - although you can, if you already have FFT for some other reason, choose to multiply them rather than folding the sequences in the time domain (but in this case, in order to get your result, you will need to perform a reverse DFT in order to return to the time domain again).

Assuming, of course, that I understand you correctly.

Then, as soon as you get the result from vDSP_convD() , you will need to find the highest value that will tell you where the signals are most strongly correlated. In addition, you may need to cope with the case when the input signal does not contain enough of your reference signal, in which case you can (for example) ignore the values ​​in the result vector below a certain level.

+1
source

Cross-correlation is a solution, yes. But there are many obstacles that you need to deal with. If you get samples from audio files, they contain an add-on that doesn't like the cross-correlation function. It is also very inefficient to correlate with all of these samples - a huge amount of time is required. I made an example code that demonstrates the time shift of two audio files. If you are interested in the sample, look at my Github project .

+1
source

All Articles