List of ngrams with zip

I can make a quick and dirty bigram sequence as follows:

>>> w = ['a', 'b', 'c', 'd'] >>> zip(w, w[1:]) [('a', 'b'), ('b', 'c'), ('c', 'd')] 

I want to make a function that takes a numeric argument n from an n-gram. How to take this argument and auto-populate the zip arguments as shown above? In other words, my function is:

 >>> make_ngrams(w, 3) 

will create

 >>> zip(w, w[1:], w[2:]) 

on the fly and return:

 [('a', 'b', 'c'), ('b', 'c', 'd')] 

Can the stars operator (s) help me here? Thank you for understanding!

+4
source share
2 answers
 def make_ngrams(lst, n): return zip(*(lst[i:] for i in xrange(n))) 

The * operator basically takes all the elements of the iteration and passes them as separate arguments to the function.

+10
source
 def ngram(L, n): return [tuple(L[i:i+n]) for i in xrange(len(L)-n+1)] 
+2
source

All Articles