I use nltk to generate n-grams from sentences, first removing the given stop words. However, nltk.pos_tag() very slowly takes up to 0.6 seconds on my processor (Intel i7).
Exit:
['The first time I went, and was completely taken by the live jazz band and atmosphere, I ordered the Lobster Cobb Salad.'] 0.620481014252 ["It simply the best meal in NYC."] 0.640982151031 ['You cannot go wrong at the Red Eye Grill.'] 0.644664049149
The code:
for sentence in source: nltk_ngrams = None if stop_words is not None: start = time.time() sentence_pos = nltk.pos_tag(word_tokenize(sentence)) print time.time() - start filtered_words = [word for (word, pos) in sentence_pos if pos not in stop_words] else: filtered_words = ngrams(sentence.split(), n)
Is it really so slow or am I doing something wrong?
python nlp nltk pos-tagger
Stefan falk
source share