How to create a large language model for CMU Sphinx?

I would like to create a language model for CMU Sphinx, but my body has more than 1000 words, so I can not use the online tool. How to use (scripts in cmuclmtk?) To create my language model?

+8
speech-recognition cmusphinx
source share
2 answers
+6
source share

Not a trivial task. Creating a language model is a task of time and resources.

If you want to have a “good” language model, you will need a large or very large text corpus to teach the language model (think in the order of magnitude of several years of transcripts of verbatim transcripts).

“good” means: if the language model can generalize learning data to new and previously invisible input

You should read the documentation for the Sphinx and HTK model tools.

http://cmusphinx.sourceforge.net/wiki/tutoriallm

Also check out these two threads:

Creating a compatible openears language model

Ruby Text Analysis

You can use a more general language model based on a larger case and interpolate your smaller language model with it. For example, a backup language model ... but this is not a trivial task.

see below: Katz Return Model

+1
source share

All Articles