There is an easy way to handle this by slightly modifying the code you have (edited to reflect John's comment):
stopWords = set(['a', 'an', 'the', ...]) fullWords = re.findall(r'\w+', allText) d = defaultdict(int) for word in fullWords: if word not in stopWords: d[word] += 1 finalFreq = sorted(d.iteritems(), key=lambda t: t[1], reverse=True) self.response.out.write(finalFreq)
This approach creates a sorted list in two steps: first, it filters out any words in the desired stop word list (which was converted to set for efficiency), then sorts the remaining entries.
David z
source share