How to create a large language model for CMU Sphinx?

Question

How to create a large language model for CMU Sphinx?

I would like to create a language model for CMU Sphinx, but my body has more than 1000 words, so I can not use the online tool. How to use (scripts in cmuclmtk?) To create my language model?

+8

speech-recognition cmusphinx

joeforker Jan 24 '11 at 14:49

source share

2 answers

Not a trivial task. Creating a language model is a task of time and resources.

If you want to have a “good” language model, you will need a large or very large text corpus to teach the language model (think in the order of magnitude of several years of transcripts of verbatim transcripts).

“good” means: if the language model can generalize learning data to new and previously invisible input

You should read the documentation for the Sphinx and HTK model tools.

http://cmusphinx.sourceforge.net/wiki/tutoriallm

Also check out these two threads:

Creating a compatible openears language model

Ruby Text Analysis

You can use a more general language model based on a larger case and interpolate your smaller language model with it. For example, a backup language model ... but this is not a trivial task.

see below: Katz Return Model

+1

Tilo Oct 05 '11 at 2:01

source share

Nikolay Shmyrev · Accepted Answer · 2011-01-24T19:20:14+0000

Read the tutorial

http://cmusphinx.sourceforge.net/wiki/tutoriallm

How to create a large language model for CMU Sphinx?

More articles: