Creating Bigrams Using NLTK

I am trying to create a bigram list of a given sentence, for example, if I type

To be or not to be 

I want the program to generate

  to be, be or, or not, not to, to be 

I tried the following code but just gave me

 <generator object bigrams at 0x0000000009231360> 

This is my code:

  import nltk bigrm = nltk.bigrams(text) print(bigrm) 

So how do I get what I want? I want a list of word combinations similar to the above (to be, to be or, or not, not to be).

+6
source share
2 answers

nltk.bigrams() returns an iterator (generator specially) for bigrams. If you need a list, pass the list() value to the iterator. He also expects a sequence of elements to generate bitrams, so you need to split the text before passing it (if you haven’t):

 bigrm = list(nltk.bigrams(text.split())) 

To print them separated by commas, you can (in python 3):

 print(*map(' '.join, bigrm), sep=', ') 

If on python 2, then for example:

 print ', '.join(' '.join((a, b)) for a, b in bigrm) 

Note that for printing you do not need to create a list, just use an iterator.

+7
source

The following code creates a bigram list for this offer

 >>> import nltk >>> from nltk.tokenize import word_tokenize >>> text = "to be or not to be" >>> tokens = nltk.word_tokenize(text) >>> bigrm = nltk.bigrams(tokens) >>> print(*map(' '.join, bigrm), sep=', ') to be, be or, or not, not to, to be 
+1
source

All Articles