Microsoft Speech Recognition: Alternative Results with Confidence Assessment?

I'm new to Microsoft.Speech Recognizer (using the Microsoft Speech Platform SDK version 11), and I try to get n-best recognition matches from simple grammar, as well as a confidence score for everyone.

According to the documentation (and as mentioned in the answer to this question ), you should be able to use e.Result.Alternates to access recognized words other than top scoring. However, even after resetting the confidence rejection threshold to 0 (which should mean that nothing will be rejected), I still get only one result and do not alternate (although SpeechHypothesized events indicate that at least one of the other words seems to be recognized with some uncertainty with non-zero confidence).

My question is: Can someone explain to me why I get only one recognized word, even if the confidence rejection threshold is set to zero? How can I get other possible matches and their confidence ratings? What am I missing here?

Below is my code. I thank everyone in advance who can help :)


In the example below, the recognizer sends a wav file to the word "news" and must choose from similar words ("noose", "newts"). I want to extract a recognizer confidence rating list for EVERY word (all must be non-zero), although as a result it will return only the best ("news").

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using Microsoft.Speech.Recognition; namespace SimpleRecognizer { class Program { static readonly string[] settings = new string[] { "CFGConfidenceRejectionThreshold", "HighConfidenceThreshold", "NormalConfidenceThreshold", "LowConfidenceThreshold"}; static void Main(string[] args) { // Create a new SpeechRecognitionEngine instance. SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); //en-US SRE // Configure the input to the recognizer. sre.SetInputToWaveFile(@"C:\Users\Anjana\Documents\news.wav"); // Display Recognizer Settings (Confidence Thresholds) ListSettings(sre); // Set Confidence Threshold to Zero (nothing should be rejected) sre.UpdateRecognizerSetting("CFGConfidenceRejectionThreshold", 0); sre.UpdateRecognizerSetting("HighConfidenceThreshold", 0); sre.UpdateRecognizerSetting("NormalConfidenceThreshold", 0); sre.UpdateRecognizerSetting("LowConfidenceThreshold", 0); // Display New Recognizer Settings ListSettings(sre); // Build a simple Grammar with three choices Choices topics = new Choices(); topics.Add(new string[] { "news", "newts", "noose" }); GrammarBuilder gb = new GrammarBuilder(); gb.Append(topics); Grammar g = new Grammar(gb); g.Name = "g"; // Load the Grammar sre.LoadGrammar(g); // Register handlers for Grammar SpeechRecognized Events g.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(gram_SpeechRecognized); // Register a handler for the recognizer SpeechRecognized event. sre.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized); // Register Handler for SpeechHypothesized sre.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized); // Start recognition. sre.Recognize(); Console.ReadKey(); //wait to close } static void gram_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { Console.WriteLine("\nNumber of Alternates from Grammar {1}: {0}", e.Result.Alternates.Count.ToString(), e.Result.Grammar.Name); foreach (RecognizedPhrase phrase in e.Result.Alternates) { Console.WriteLine(phrase.Text + ", " + phrase.Confidence); } } static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { Console.WriteLine("\nSpeech recognized: " + e.Result.Text + ", " + e.Result.Confidence); Console.WriteLine("Number of Alternates from Recognizer: {0}", e.Result.Alternates.Count.ToString()); foreach (RecognizedPhrase phrase in e.Result.Alternates) { Console.WriteLine(phrase.Text + ", " + phrase.Confidence); } } static void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e) { Console.WriteLine("Speech from grammar {0} hypothesized: {1}, {2}", e.Result.Grammar.Name, e.Result.Text, e.Result.Confidence); } private static void ListSettings(SpeechRecognitionEngine recognizer) { foreach (string setting in settings) { try { object value = recognizer.QueryRecognizerSetting(setting); Console.WriteLine(" {0,-30} = {1}", setting, value); } catch { Console.WriteLine(" {0,-30} is not supported by this recognizer.", setting); } } Console.WriteLine(); } } } 

This gives the following result:

 Original recognizer settings: CFGConfidenceRejectionThreshold = 20 HighConfidenceThreshold = 80 NormalConfidenceThreshold = 50 LowConfidenceThreshold = 20 Updated recognizer settings: CFGConfidenceRejectionThreshold = 0 HighConfidenceThreshold = 0 NormalConfidenceThreshold = 0 LowConfidenceThreshold = 0 Speech from grammar g hypothesized: noose, 0.2214646 Speech from grammar g hypothesized: news, 0.640804 Number of Alternates from Grammar g: 1 news, 0.9208503 Speech recognized: news, 0.9208503 Number of Alternates from Recognizer: 1 news, 0.9208503 

I also tried to implement this with a separate phrase for each word (instead of one phrase with three choices) and even with a separate grammar for each word / phrase. The results are basically the same: only one โ€œalternatingโ€.

+4
source share
1 answer

I believe this is another place where SAPI allows you to request things that the SR server really does not support.

Both Microsoft.Speech.Recognition and System.Speech.Recognition use the basic SAPI interfaces to do their job; the only difference is which SR engine is used. (Microsoft.Speech.Recognition uses the server engine; System.Speech.Recognition uses the Desktop engine.)

The alternatives are mainly for dictation, not contextual grammars. You can always get an alternative for CFG, but the alternative generation code looks like it will not extend the alternatives for CFG.

Unfortunately, Microsoft.Speech.Recognition does not support dictation. (However, it works with much lower sound quality and does not require training.)

+1
source

All Articles