SpeechRecognition recognizes background noise as speech

I am using MSDN SpeechRecognitionEngine in my program. The problem is that it recognizes background noise as speech.

For example, if you click your fingers, touch the table or move the chair, he chooses it as speech.

Why in the world, a dose recognizes background noise as speech.

It was enough for me that my fingers do not sound the same as me, saying "Notepad" !!!

Here is the code

 using System; using System.Threading; using System.Speech; using System.Speech.Synthesis; using System.Speech.Recognition; namespace SpeachTest { public class MainClass { static void Main() { MainClass main = new MainClass(); SpeechRecognitionEngine sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US")); Choices choiceList = new Choices(); choiceList.Add(new string[]{"Open", "Close", "Then", "Volume", "Up", "Firefox", "Notepad", "Steam","turn", "the", "now" } ); GrammarBuilder builder = new GrammarBuilder(); builder.Append(choiceList); Grammar grammar = new Grammar(new GrammarBuilder(builder,0, 10) ); sre.SpeechRecognized += main.sreRecognizedEvent; sre.SpeechDetected += main.sreDetectEvent; sre.SpeechRecognitionRejected += main.sreRejectEvent; sre.RecognizeCompleted += main.sreCompleteEvent; sre.InitialSilenceTimeout = TimeSpan.FromSeconds(0); sre.BabbleTimeout = TimeSpan.FromSeconds(0); sre.EndSilenceTimeout = TimeSpan.FromSeconds(0); sre.EndSilenceTimeoutAmbiguous = TimeSpan.FromSeconds(0); sre.SetInputToDefaultAudioDevice(); sre.LoadGrammar(grammar); while(true){ sre.Recognize(); } } void sreRecognizedEvent(Object sender, SpeechRecognizedEventArgs e){ Console.Write("Reconized ~ " + e.Result.Text + " ~ with confidence " + e.Result.Confidence); Console.WriteLine(); } void sreDetectEvent(Object sender, SpeechDetectedEventArgs e){ Console.WriteLine("Detected some type of input"); } void sreRejectEvent(Object sender, SpeechRecognitionRejectedEventArgs e){ Console.WriteLine("Rejected Input ~ " + e.Result.Text) ; } void sreCompleteEvent(Object sender, System.Speech.Recognition.RecognizeCompletedEventArgs e){ Console.WriteLine("Completed Recongnization"); } } } 
+5
source share
3 answers

It turns out my microphone sensitivity is too high. very, very high, to be precise. It was at level 100, which meant that he picked up the smallest sounds (like background noise).

I assume that these small sounds will be amplified to such a high degree that SpeechRecognitionEngine will be difficult for SpeechRecognitionEngine to distinguish it from real speech.

Sensitivity conversion into about 20 or 30 tricks. enter image description hereA more appropriate sensitivity

+1
source

By avoiding any filtering algorithms, you can check the Confidence property that you are currently showing. It ranges between 0.0 and 1.0 , where 1 is very certain. I find that 0.7 works well, but you can do without trial and error.

 void sreRecognizedEvent(Object sender, SpeechRecognizedEventArgs e) { if(e.Result.Confidence >= 0.7) { Console.Write("Reconized ~ " + e.Result.Text + " ~ with confidence " + e.Result.Confidence); Console.WriteLine(); } } 
+4
source

What and how much incomprehensible sounds are rejected by the recognizer differ greatly from the recognizer to the recognizer. My experience with Microsoft Recognizer is that it tries very hard to find words. For example, using Google DragonDictate or Google recognition, you can click your fingers or cough, and they are rejected. Microsoft Recognizer also aggressively tracks sound levels, so if it listens for a lot of silence, it will internally simulate an increase in gain by lowering detection thresholds. (I experienced this by recognizing the rustling of paper or the sound of air conditioning as human speech.)

The solution that I have used for many years with great success is somewhat contrary to intuition. You need to add your own garbage speech model. Since you are just using a list of words, not a complex grammar, this should work well and be easy to do.

You are currently listening: "Open", "Close", "Then", "Volume", "Up", "Firefox", "Notepad", "Steam", "turn", "the", "now"

You should add a few words to the list (which you are listening to) that are somewhat (but not too) similar. For example, adding an “apron” and “a bull” will effectively be honey traps in the immediate vicinity of the word “open”. You can trust more that the person actually said “openly” when he appears as a result. In addition, adding a few short words that have nothing to do with your command words will catch more non-sintering sounds. I suspect that the "tap" is likely to be recognized when you click your fingers.

To summarize: Recognize this longer list of words, but only act on them if they are on your list of commands. If you use the case statement in your code, it is absurdly simple, only branches on your commands. Otherwise, you need to test the "good" list.

Note. This method also works when you perform more complex recognition using a speech recognition grammar. You simply put all these “junk” phrases under a grammar rule called “junk”, and you can reject any statement that was recognized by this rule.

+1
source

Source: https://habr.com/ru/post/1213471/


All Articles