How to convert a human voice to digital format?

I am working on a project that uses a biometric system to protect the system. We plan to use a human voice to ensure system security.

The idea is to allow a person to say a few words or sentences, and the system will save this voice in digital format. The next time a person wants to enter the system, he / she should say a few words, which may or may not differ from the words used earlier.

We do not want to match words, but we want to match the frequency of the voice.

I have read some scientific articles on this system, but these documents do not have any implementation details.

Therefore, I just want to know if there is any software / API that can convert analog voice to digital format, and also tell us the frequency of the voice.

So far I have been working on regular web applications, so I know the usual APIs and platforms like Java EE, C #, etc., but I have no experience in this application.

Please enlighten !!!

+8
security speech-recognition speech-to-text analog-digital-converter
source share
3 answers

This is also a good starting point, like any: http://marsyas.info/

This is an open source software environment for audio processing. They listed many projects that used their framework in different ways so that you could draw inspiration from it. http://marsyas.info/about/projects . The Telligence project, in particular, seems closest to your needs, as it has been used to gender-classify audio: http://marsyas.info/about/projects#5Teligence

+2
source share

In a project like this, there are two steps:

The first step is to record the voice from the analog input into a digital format (for example, wav-pcm). To do this, you can use the DirectShow API in C # or the standard Wav-In, as in this project: http://www.codeproject.com/KB/audio-video/cswavrec.aspx . You can consider compressing your audio files later, there are many options for this: on Windows, you can use the Windows Media Format SDK to avoid licensing issues in other formats.

The second step is to create or use a voice recognition infrastructure, if you want to create a recognition system, you probably need to define a set of "functions" for your sound fragments and choose + implement a recognition algorithm. A lot of testing is available for this; IEEE amd ACM.org websites are usually good sources. If you want to use your existing infrastructure, you can consider Nuance Recognizer (commercial) or http://cmusphinx.sourceforge.net (open source).

Hope this helps.

+2
source share

All Articles