Counting unique words in python

In direct, my code is still like this:

from glob import glob pattern = "D:\\report\\shakeall\\*.txt" filelist = glob(pattern) def countwords(fp): with open(fp) as fh: return len(fh.read().split()) print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern 

I want to add code that takes into account the unique words from the template (42 txt files along the way), but I don't know how to do this. Can anybody help me?

+4
source share
3 answers

The best way to count objects in Python is to use the collections.Counter class that was created for this purpose. It acts like a Python recorder, but is a little easier to use when counting. You can simply pass a list of objects and it will automatically calculate them for you.

 >>> from collections import Counter >>> c = Counter(['hello', 'hello', 1]) >>> print c Counter({'hello': 2, 1: 1}) 

Counter also has some useful methods, such as most_common, to learn more.

One of the methods of the Counter class, which can also be very useful, is the update method. After you have created the Counter instance by passing the list of objects, you can do the same with the update method, and it will continue to count without discarding the old counters for the objects:

 >>> from collections import Counter >>> c = Counter(['hello', 'hello', 1]) >>> print c Counter({'hello': 2, 1: 1}) >>> c.update(['hello']) >>> print c Counter({'hello': 3, 1: 1}) 
+7
source
 print len(set(w.lower() for w in open('filename.dat').read().split())) 

Reads the entire file in memory, breaks it into words using spaces, converts each word in lower case, creates a (unique) set of lowercase words, counts them and prints the output

+2
source

If you want to get the amount of each unique word, use dicts:

 words = ['Hello', 'world', 'world'] count = {} for word in words : if word in count : count[word] += 1 else: count[word] = 1 

And you will get a dict

 {'Hello': 1, 'world': 2} 
0
source

All Articles