I try to solve a difficult problem and get lost.
Here is what I have to do:
INPUT: file OUTPUT: dictionary Return a dictionary whose keys are all the words in the file (broken by whitespace). The value for each word is a dictionary containing each word that can follow the key and a count for the number of times it follows it. You should lowercase everything. Use strip and string.punctuation to strip the punctuation from the words. Example: >>> #example.txt is a file containing: "The cat chased the dog." >>> with open('../data/example.txt') as f: ... word_counts(f) {'the': {'dog': 1, 'cat': 1}, 'chased': {'the': 1}, 'cat': {'chased': 1}}
Here is all that I have so far tried to at least pull out the right words:
def word_counts(f): i = 0 orgwordlist = f.split() for word in orgwordlist: if i<len(orgwordlist)-1: print orgwordlist[i] print orgwordlist[i+1] with open('../data/example.txt') as f: word_counts(f)
I think I need to somehow use the .count method and, in the end, pin some dictionaries together, but I'm not sure how to count the second word for every first word.
I know that I am not solving the problem anywhere, but I am trying to do it step by step. Any help is appreciated, even tips pointing in the right direction.
python dictionary counter n-gram
Kristie
source share