I know how to get bigrams and trigrams using NLTK, and I apply them to my own corporations. The code is below.
I am not sure, however, about (1), how to get matches for a specific word? (2) Does the NLTK have a collocation index based on the log likelihood coefficient?
import nltk from nltk.collocations import * from nltk.tokenize import word_tokenize text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence" trigram_measures = nltk.collocations.TrigramAssocMeasures() finder = TrigramCollocationFinder.from_words(word_tokenize(text)) for i in finder.score_ngrams(trigram_measures.pmi): print i
python nltk collocation
Sabba
source share