I installed Pocketsphinx0.7 on a Debian Squeeze virtual machine. This worked well, and I can try to recognize speech from files. With this, I created some python scripts that recognize a bunch of files that I received, and then evaluate the error rate in words. They use gstreamer as described in this tutorial .
So far I am using the original hmm, which was in the pocketsphinx tarball file, a dictionary that just contains words from my test data and an optimized language model from my professor. This should work the same way as in the production system. My problem is that recognition performance is still terrible. I have an error with a word error (WER) of about 85%.
I want to know how I can improve WER. What steps can I take?
Another thing that happens and probably affects performance is that pocketsphinx tells me that it does not have permission to access hmm, although I made hmm readable, writable, and executable for everyone.
Does anyone have an idea where this might come from? I appreciate any help. If you need more information, please let me know.
EDIT:
I created a small test suite and ran pocketsphinx. Here you can find files and results. I was allowed to give you some examples from the original test suite. You can find it here .
These are the worst examples. Short sentences of 1-2 words work well. Unfortunately, until now I could not create a large set of tests, my time is very limited.
source share