Training hidden Markov models without tag data Corpus

For the course of linguistics, we introduced the Part of Speech (POS) tag using a hidden Markov model, where hidden variables were parts of speech. We trained the system with some tagged data, and then tested it and compared our results with gold data.

Would it be possible to train HMM without a labeled training kit?

+4
source share
2 answers

In theory, you can do this. In this case, you will use the Baum-Welsh algorithm. This is very well described in the Rabiner HMM Tutorial .

However, by applying the HMM to part of the speech, the error you get with the standard form will not satisfy. This is a form of maximizing expectations that only converges to local maxima. Rule-based approaches knock out HMMs hands down, iirc.

I believe the natural language NLTK toolkit for python has an HMM implementation for this specific purpose.

+6
source

NLP was a couple of years ago, but I believe that without labeling, HMM can help determine the transition probabilities / transition states of the n-gram symbol (that is, what are the chances of a โ€œpeaceโ€ arising after a โ€œhelloโ€), but not part of speech. To learn how POS is interconnected, a tagged enclosure is required.

If I get away from this, let me know in the comments!

+1
source

All Articles