How to undo a word in Python?

Question

How to undo a word in Python?

I want to know if in any case there is an opportunity to cancel them to normal form?

The problem is that I have thousands of words in different forms, for example. eat, eat, eat, eat and so on, and I need to calculate the frequency of each word. All of them - eat, eat, eat, eat, etc., will rely on what to eat and, therefore, I used it.

But the next part of the problem requires me to search for similar words in the data, and I use syncets nltk to calculate the Wu-Palmer similarity among words. The problem is that nltk synsets will not work on words based on the word, or at least they will not be in this code. check if two words are related to each other

How should I do it? Is there a way to cancel a word?

+4

python nlp nltk

user3667569 May 15, '15 at 18:34

source share

4 answers

No no. With exhaustion, you lose information not only about the form of the word (as in eating vs. eating or eating), but also about the word itself (as in traditions vs. traditional). If you are not going to use the forecasting method to try to predict this information based on the context of this word, then there is no way to return it.

+3

yvespeirsman May 15, '15 at 19:44

source share

- , . - , , , , .

" " , -, - BoW.

, . , .

0

pebox11 05 . '15 16:04

I think that approach is ok - it's something like fooobar.com/questions/1588245 / ... .

Possible implementations might be something like this:

import re
import string
import nltk
import pandas as pd
stemmer = nltk.stem.porter.PorterStemmer()

Stemmer is used. Here is the text to use:

complete_text = ''' cats catlike catty cat 
stemmer stemming stemmed stem 
fishing fished fisher fish 
argue argued argues arguing argus argu 
argument arguments argument '''

Create a list with different words:

my_list = []
#for i in complete_text.decode().split():
try: 
    aux = complete_text.decode().split()
except:
    aux = complete_text.split()
for i in aux:
    if i not in my_list:
        my_list.append(i.lower())
my_list

with output:

['cats',
 'catlike',
 'catty',
 'cat',
 'stemmer',
 'stemming',
 'stemmed',
 'stem',
 'fishing',
 'fished',
 'fisher',
 'fish',
 'argue',
 'argued',
 'argues',
 'arguing',
 'argus',
 'argu',
 'argument',
 'arguments']

Now create a dictionary:

aux = pd.DataFrame(my_list, columns =['word'] )
aux['word_stemmed'] = aux['word'].apply(lambda x : stemmer.stem(x))
aux = aux.groupby('word_stemmed').transform(lambda x: ', '.join(x))
aux['word_stemmed'] = aux['word'].apply(lambda x : stemmer.stem(x.split(',')[0]))
aux.index = aux['word_stemmed']
del aux['word_stemmed']
my_dict = aux.to_dict('dict')['word']
my_dict

What result:

{'argu': 'argue, argued, argues, arguing, argus, argu',
 'argument': 'argument, arguments',
 'cat': 'cats, cat',
 'catlik': 'catlike',
 'catti': 'catty',
 'fish': 'fishing, fished, fish',
 'fisher': 'fisher',
 'stem': 'stemming, stemmed, stem',
 'stemmer': 'stemmer'}

0

Rafael valero Apr 17 '18 at 15:43

source share

steve · Accepted Answer · 2015-05-15T20:21:06+0000

I suspect that you really mean that the trunk is "tense". As with the case, you want each time of each conversation to differ with respect to the "basic form" of the verb.

check patternpackage

pip install pattern

Then use the en.lemma function to return the base form of the verb.

import pattern.en as en
base_form = en.lemma('ate') # base_form == "eat"

How to undo a word in Python?

More articles: