How to understand this formula in the Lingpipe language model?

This is from the Lingpipe doc tutorial on building a language model. But I only partially understand the theory underlying it.

I especially do not know the basic probability.

enter image description here

enter image description here

Here's how to get the base p (d). If below - part of the token and their frequency in the unigram file.

ab 20 aba 3 abd 2 abef 2 abkk 3 

Under such a condition, what are lamda (), 1-lamda (), extcount, numExtentions and Base P (ab)? This is one question, but they are connected by a chain.

Thank you very much.

+1
source share

All Articles