NLTK another language POS tagger

I am using the nltk module in python and I am trying to use it for POS marking in different languages.

There is a lot of information on how to train your own POS tagger in different languages ​​- is there a database of really reliable well-built and tested NLTK POS tags for different languages? (It is very easy to export POS tags using the brine module)

+8
python nlp nltk
source share
3 answers

You can find the reliable and well-built and tested NLTK Corpora at http://www.nltk.org/nltk_data/

You can find other bodies, but these are the best.

+4
source share

If it’s not strict to use only NLTK, you can try our reliable and language-independent POS tagging toolkit RDRPOSTagger .

(License: GPLv2; Programming Language: Python and Java)

RDRPOSTagger gets fast performance both in training and in methods. In addition, the RDRPOSTagger achieves very competitive accuracy compared to the most advanced results.

Updated 11/18/2015: version 1.2 with improved accuracy mark, especially in languages ​​rich in morphology. See Experimental results, including speed and labeling accuracy, in this document .

RDRPOSTagger supports pre-prepared models of POS and morphological labels for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. RDRPOSTagger also supports pre-prepared universal POS marking models for 40 languages.

+3
source share

From what I know, there is no such database of reliable well-built and tested POS tags. I really think this is a good idea.

I myself tried a couple of tags. For the big English corps, I used: http://gmb.let.rug.nl/

For Spanish, I used the one included in NLTK (cess_esp)

from nltk.corpus import cess_esp as cess 

For quick training on simple tags, you can check out the NLTK trainer:

https://nltk-trainer.readthedocs.org/en/latest/train_tagger.html

+1
source share

All Articles