SFSpeechRecognizer - Detects End of Statement

I’m hacking a small project using the built-in speech recognition of iOS 10. I have work results using the device’s microphone, my speech is recognized very accurately.

My problem is that for each available partial transcription, the destination task callback is called, and I want it to detect that the person has stopped the conversation , and will call the callback with the isFinal property set to true. This does not happen - the application listens endlessly.

Can SFSpeechRecognizer recognize the end of a sentence?

Here is my code - it is based on an example found on web pages, it is basically a template that needs to be recognized from the microphone source. I changed it by adding taskHint recognition. I also set shouldReportPartialResults to false, but it seems to have been ignored.

  func startRecording() { if recognitionTask != nil { recognitionTask?.cancel() recognitionTask = nil } let audioSession = AVAudioSession.sharedInstance() do { try audioSession.setCategory(AVAudioSessionCategoryRecord) try audioSession.setMode(AVAudioSessionModeMeasurement) try audioSession.setActive(true, with: .notifyOthersOnDeactivation) } catch { print("audioSession properties weren't set because of an error.") } recognitionRequest = SFSpeechAudioBufferRecognitionRequest() recognitionRequest?.shouldReportPartialResults = false recognitionRequest?.taskHint = .search guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") } guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object") } recognitionRequest.shouldReportPartialResults = true recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in var isFinal = false if result != nil { print("RECOGNIZED \(result?.bestTranscription.formattedString)") self.transcriptLabel.text = result?.bestTranscription.formattedString isFinal = (result?.isFinal)! } if error != nil || isFinal { self.state = .Idle self.audioEngine.stop() inputNode.removeTap(onBus: 0) self.recognitionRequest = nil self.recognitionTask = nil self.micButton.isEnabled = true self.say(text: "OK. Let me see.") } }) let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in self.recognitionRequest?.append(buffer) } audioEngine.prepare() do { try audioEngine.start() } catch { print("audioEngine couldn't start because of an error.") } transcriptLabel.text = "Say something, I'm listening!" state = .Listening } 
+7
ios sfspeechrecognizer
source share
1 answer

It seems that the isFinal flag did not become true when the user stops talking as expected. I assume this is an apple from Apple, because the "User stops talking" event is an undefined event.

I believe the easiest way to achieve your goal is to do the following:

  • You must set the “silence interval”. This means that if the user does not talk for some time longer than your interval, he stops talking (i.e. 2 seconds).

  • Create a timer at the beginning of the audio session :

var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • when you get new transcriptions in invalid recognitionTask and restart the timer

    timer.invalidate() timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • If the timer expires, it means that the user is not talking for 2 seconds. You can safely stop the audio session and exit

+8
source share

All Articles