I did such things in my course project, but with image files. The thing you would like to explore is Cross-correlation . In which you can measure the similarities between two signals. You can also pre-process the audio files, for example: normalize, apply a low-pass filter to remove noise.
I would suggest Oppenheim Digital Signal Processing to get a deep understanding of signal processing.
But again, these methods are pretty vague until yopu finds out what type of similarity you want to find.
source share