What and how much incomprehensible sounds are rejected by the recognizer differ greatly from the recognizer to the recognizer. My experience with Microsoft Recognizer is that it tries very hard to find words. For example, using Google DragonDictate or Google recognition, you can click your fingers or cough, and they are rejected. Microsoft Recognizer also aggressively tracks sound levels, so if it listens for a lot of silence, it will internally simulate an increase in gain by lowering detection thresholds. (I experienced this by recognizing the rustling of paper or the sound of air conditioning as human speech.)
The solution that I have used for many years with great success is somewhat contrary to intuition. You need to add your own garbage speech model. Since you are just using a list of words, not a complex grammar, this should work well and be easy to do.
You are currently listening: "Open", "Close", "Then", "Volume", "Up", "Firefox", "Notepad", "Steam", "turn", "the", "now"
You should add a few words to the list (which you are listening to) that are somewhat (but not too) similar. For example, adding an “apron” and “a bull” will effectively be honey traps in the immediate vicinity of the word “open”. You can trust more that the person actually said “openly” when he appears as a result. In addition, adding a few short words that have nothing to do with your command words will catch more non-sintering sounds. I suspect that the "tap" is likely to be recognized when you click your fingers.
To summarize: Recognize this longer list of words, but only act on them if they are on your list of commands. If you use the case statement in your code, it is absurdly simple, only branches on your commands. Otherwise, you need to test the "good" list.
Note. This method also works when you perform more complex recognition using a speech recognition grammar. You simply put all these “junk” phrases under a grammar rule called “junk”, and you can reject any statement that was recognized by this rule.