Based on your requirements without import and a simple approach, the following function does this without any changes, comments and variable names should make the function logic pretty clear:
def match_previous(lst, word):
Two arguments are required: a broken list of words and a search word:
In [41]: text = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to previous_wordent me from deliberately stepping into the street, and methodically knocking people hats off - then, I acmatches_count it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me." In [42]: match_previous(text.split(),"the") Out[42]: 4.4 In [43]: match_previous(text.split(),"ship.") Out[43]: 3.0 In [44]: match_previous(text.split(),"whale") Out[44]: False In [45]: match_previous(text.split(),"Call") Out[45]: 0.0
You obviously can do the same as your own function, take one argument, split the text into functions. The only way to return False is if we did not find a match for this word, you can see that call returns 0.0, since this is the first word in the text.
If we add some fingerprints to the code and use the enumeration:
def match_previous(lst, word): matches_count = total_length_sum = 0.0 previous_word = lst[0] rest_of_words = lst[1:] if previous_word == word: print("First word matches.") matches_count += 1 for ind, current_word in enumerate(rest_of_words, 1): print("On iteration {}.\nprevious_word = {} and current_word = {}.".format(ind, previous_word, current_word)) if word == current_word: total_length_sum += len(previous_word) matches_count += 1 print("We found a match at index {} in our list of words.".format(ind-1)) print("Updating previous_word from {} to {}.".format(previous_word, current_word)) previous_word = current_word return total_length_sum / matches_count if matches_count else False
And run it with a short list of samples, we will see what happens:
In [59]: match_previous(["bar","foo","foobar","hello", "world","bar"],"bar") First word matches. On iteration 1. previous_word = bar and current_word = foo. Updating previous_word from bar to foo. On iteration 2. previous_word = foo and current_word = foobar. Updating previous_word from foo to foobar. On iteration 3. previous_word = foobar and current_word = hello. Updating previous_word from foobar to hello. On iteration 4. previous_word = hello and current_word = world. Updating previous_word from hello to world. On iteration 5. previous_word = world and current_word = bar. We found a match at index 4 in our list of words. Updating previous_word from world to bar. Out[59]: 2.5
The advantage of using iter is that we do not need to create a new list by slicing the remainder to use it in the code that you just need to change to run the function:
def match_previous(lst, word): matches_count = total_length_sum = 0.0
Each time you consume an element from an iterator, we move on to the next element:
In [61]: l = [1,2,3,4] In [62]: it = iter(l) In [63]: next(it) Out[63]: 1 In [64]: next(it) Out[64]: 2
The only way it really makes sense is to take a few words into your function, which you can do with * args :
def sum_previous(text): _iterator = iter(text.split()) previous_word = next(_iterator) # set first k/v pairing with the first word # if "total_lengths" is 0 at the end we know there # was only one match at the very start avg_dict = {previous_word: {"count": 1.0, "total_lengths": 0.0}} for current_word in _iterator: # if key does not exist, it creates a new key/value pairing avg_dict.setdefault(current_word, {"count": 0.0, "total_lengths": 0.0}) # update value adding word length and increasing the count avg_dict[current_word]["total_lengths"] += len(previous_word) avg_dict[current_word]["count"] += 1 previous_word = current_word # return the dict so we can use it outside the function. return avg_dict def match_previous_generator(*args): # create our dict mapping words to sum of all lengths of their preceding words. d = sum_previous(text) # for every word we pass to the function. for word in args: # use dict.get with a default of an empty dict. # to catch when a word is not in out text. count = d.get(word, {}).get("count") # yield each word and it avg or False for non existing words. yield (word, d[word]["total_lengths"] / count if count else False)
Then just pass the text and all the words you want to find, you can call the list in the generator function
Or repeat it:
In [70]: for tup in match_previous_generator("the","Call", "whale", "ship."): ....: print(tup) ....: ('the', 4.4) ('Call', 0.0) ('whale', False) ('ship.', 3.0)