I am trying to extract named objects from Dutch text. I used nltk-trainer to train the tagger and chunker on the body of the Dutch conll2002. However, the parse method from chunker does not detect any named objects. Here is my code:
str = 'Christiane heeft een lam.' tagger = nltk.data.load('taggers/dutch.pickle') chunker = nltk.data.load('chunkers/dutch.pickle') str_tags = tagger.tag(nltk.word_tokenize(str)) print str_tags str_chunks = chunker.parse(str_tags) print str_chunks
And the output of this program:
[('Christiane', u'N'), ('heeft', u'V'), ('een', u'Art'), ('lam', u'Adj'), ('.', u'Punc')] (S Christiane/N heeft/V een/Art lam/Adj ./Punc)
I was expecting Christiane to be discovered as a named object. Any help?
user1491915
source share