Creating Bigrams Using NLTK

Question

Creating Bigrams Using NLTK

I am trying to create a bigram list of a given sentence, for example, if I type

To be or not to be

I want the program to generate

  to be, be or, or not, not to, to be

I tried the following code but just gave me

 <generator object bigrams at 0x0000000009231360>

This is my code:

  import nltk bigrm = nltk.bigrams(text) print(bigrm)

So how do I get what I want? I want a list of word combinations similar to the above (to be, to be or, or not, not to be).

+6

python nltk n-gram

Nikhil Raghavendra Jun 06 '16 at 6:44

source share

2 answers

The following code creates a bigram list for this offer

 >>> import nltk >>> from nltk.tokenize import word_tokenize >>> text = "to be or not to be" >>> tokens = nltk.word_tokenize(text) >>> bigrm = nltk.bigrams(tokens) >>> print(*map(' '.join, bigrm), sep=', ') to be, be or, or not, not to, to be

+1

Ashok kumar jayaraman Nov 07 '17 at 9:26

source share

Ilja everilä · Accepted Answer · 2016-06-06T06:52:22+0000

nltk.bigrams() returns an iterator (generator specially) for bigrams. If you need a list, pass the list() value to the iterator. He also expects a sequence of elements to generate bitrams, so you need to split the text before passing it (if you haven’t):

 bigrm = list(nltk.bigrams(text.split()))

To print them separated by commas, you can (in python 3):

 print(*map(' '.join, bigrm), sep=', ')

If on python 2, then for example:

 print ', '.join(' '.join((a, b)) for a, b in bigrm)

Note that for printing you do not need to create a list, just use an iterator.

Creating Bigrams Using NLTK

More articles: