Toxicize a paragraph in a sentence and then in words in an NLTK

Question

Toxicize a paragraph in a sentence and then in words in an NLTK

I am trying to enter a complete paragraph in my word processor, which is first broken down into sentences, and then into words.

I tried the following code, but it does not work,

#text is the paragraph input sent_text = sent_tokenize(text) tokenized_text = word_tokenize(sent_text.split) tagged = nltk.pos_tag(tokenized_text) print(tagged)

however this does not work and gives me errors. So, how do I fake paragraphs in sentences and then words?

Paragraph example:

This thing seemed to prevail and surprise a small dark brown dog and wound him in the heart. He fell in despair to the child's feet. When the blow was repeated, along with an exhortation in children's offers, he turned on his back and held his paws with a special look. At the same time, with ears and eyes, he offered a small prayer to the child.

** WARNING: ** This is just random text from the Internet, I do not own the above content.

+31

python nltk

Nikhil Raghavendra Jun 03 '16 at 4:03

source share

3 answers

Here is a shorter version. This will give you a data structure with every single sentence and every token in the sentence. I prefer TweetTokenizer for a dirty, real world. A sentence fragment is considered worthy, but be careful not to omit the phrase before this step, as this may affect the accuracy of detecting the boundaries of a messy text.

 from nltk.tokenize import TweetTokenizer, sent_tokenize tokenizer_words = TweetTokenizer() tokens_sentences = [tokenizer_words.tokenize(t) for t in nltk.sent_tokenize(input_text)] print(tokens_sentences)

This is what the output I cleaned up looks like, so the structure stands out:

 [ ['This', 'thing', 'seemed', 'to', 'overpower', 'and', 'astonish', 'the', 'little', 'dark-brown', 'dog', ',', 'and', 'wounded', 'him', 'to', 'the', 'heart', '.'], ['He', 'sank', 'down', 'in', 'despair', 'at', 'the', "child's", 'feet', '.'], ['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'], ['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.'] ]

+6

Brian cugelman Dec 25 '17 at 1:39

source share

 import nltk textsample ="This thing seemed to overpower and astonish the little dark-brown dog, and wounded him to the heart. He sank down in despair at the child feet. When the blow was repeated, together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner. At the same time with his ears and his eyes he offered a small prayer to the child." sentences = nltk.sent_tokenize(textsample) words = nltk.word_tokenize(textsample) sentences [w for w in words if w.isalpha()]

The last line above ensures that the output will only contain words, not special characters. The output of the sentence is as shown below.

["This thing seemed to crush and hit the little dark brown dog and hurt him to the core." "He fell despairingly to the child’s feet," "When the blow was repeated, along with a warning in children's sentences, he rolled onto his back and peculiarly held his paws." "At the same time, with his ears and eyes, he offered a small prayer to the child."]

Display words as shown below after removing special characters

['This', 'thing', 'It seemed', 'Before', 'Overpower', 'as well as', 'Astonish', 'Those', 'a little', 'dog', 'as well', 'Wounded' , 'him,' 'To, "" That, "' heart, '" He, "" Drowned, "' down, '' in," Despair, "in," The, "" Child, " Legs, 'When', 'Theme', 'blow', 'was', 'Repeated', 'all together', 'involving', 'An,' Admonition ',' in ',' Baby ',' sentences ',' he ',' Turned out ',' above ',' On ',' him ',' Back ',' as well as', 'Manual', 'him', "Paws", 'in', "A" , "Original", "Manner", 'B', "That", 'same', 'time', 'with participation', 'him', 'ears',' as well as', 'him', 'eyes ',' he ',' proposed ', "A",' small ', "Prayer", "To", Theme, Child]

0

Sripathi Oct 6 '19 at 2:42

source share

slider · Accepted Answer · 2016-06-03T04:18:39+0000

You probably intended to sent_text over sent_text :

 import nltk sent_text = nltk.sent_tokenize(text) # this gives us a list of sentences # now loop over each sentence and tokenize it separately for sentence in sent_text: tokenized_text = nltk.word_tokenize(sentence) tagged = nltk.pos_tag(tokenized_text) print(tagged)

Toxicize a paragraph in a sentence and then in words in an NLTK

More articles: