Microphone input voice activity detection on iOS

I am developing an iOS application that runs AI with voice support; that is, it is designed to input voice input from a microphone, turn it into text, send it to the AI ​​agent, then output the returned text through the speaker. Everything works for me, although using the start and stop buttons for voice recording (SpeechKit for voice recognition, API.AI for AI, Amazon Polly for output).

I want the microphone to always turn on and automatically start and stop recording user voice when they start and end the conversation. This application is designed for an unorthodox context where there will be no access to the screen for the user (but they will have a high-quality microphone for recording their text).

My research shows that this piece of the puzzle is known as Voice Activity Detection and is apparently one of the most difficult steps in the entire AI voice system.

I hope that someone can provide some simple (Swift) code for implementing this myself or point me towards some decent libraries / SDKs that I can implement in this project.

+7
ios artificial-intelligence swift voice-recognition voice-recording
source share
1 answer

For a good implementation of the VAD algorithm, you can use py-webrtcvad .

This is the Python interface for C code, you can simply import C files from the project and use them from swift.

+2
source share

All Articles