Comparison of two speech sounds

I need to be able to determine if the two sounds are very similar. The goal is to have a very limited vocabulary (10 or 15) of short one or two syllable words, and then compare the captured sound to determine if it is one of those objects with all the usual variability of environmental and capture conditions. The idea is that the user can issue a few simple commands using a voice instead of a keyboard or mouse.

Does anyone know a better approach to this? I don't want to do full-blown speech recognition, just something much more limited.

+4
source share
1 answer

I would reconsider using a speech recognition library ... for example CMU Sphinx software or Microsoft speech recognizer . Unfortunately, this is not an easy task to do it yourself. One approach, which is somewhat typical for how to accomplish what you are trying to do, is as follows:

1) Grind the sample into small segments (a few milliseconds)

2) Fourier transform on each segment, collecting the main coefficients

3) use the hidden Markov model to find out the likely transition of phonemes based on your sequence of coefficients

4) go to a dictionary depicting phonemes, to words (you can see the Sphinx dictionary as a guide) ... a small set like yours should give excellent results.


If you want to simplify this a bit, you can try using coefficients with specific timestamps and submit them to the SVM or neural network ... I have not tried this yet, but you could get reasonable results with some tweaking.

+3
source

Source: https://habr.com/ru/post/1311255/


All Articles