Counting unique words in python

Question

Counting unique words in python

In direct, my code is still like this:

from glob import glob pattern = "D:\\report\\shakeall\\*.txt" filelist = glob(pattern) def countwords(fp): with open(fp) as fh: return len(fh.read().split()) print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern

I want to add code that takes into account the unique words from the template (42 txt files along the way), but I don't know how to do this. Can anybody help me?

+4

python word-count

rocksland Aug 10 '12 at 10:33

source share

3 answers

 print len(set(w.lower() for w in open('filename.dat').read().split()))

Reads the entire file in memory, breaks it into words using spaces, converts each word in lower case, creates a (unique) set of lowercase words, counts them and prints the output

+2

NIlesh Sharma Aug 10 '12 at 10:43

source share

If you want to get the amount of each unique word, use dicts:

 words = ['Hello', 'world', 'world'] count = {} for word in words : if word in count : count[word] += 1 else: count[word] = 1

And you will get a dict

 {'Hello': 1, 'world': 2}

0

Rustam safin Aug 10 '12 at 10:36

source share

Rostyslav Dzinko · Accepted Answer · 2012-08-10T10:43:09+0000

The best way to count objects in Python is to use the collections.Counter class that was created for this purpose. It acts like a Python recorder, but is a little easier to use when counting. You can simply pass a list of objects and it will automatically calculate them for you.

 >>> from collections import Counter >>> c = Counter(['hello', 'hello', 1]) >>> print c Counter({'hello': 2, 1: 1})

Counter also has some useful methods, such as most_common, to learn more.

One of the methods of the Counter class, which can also be very useful, is the update method. After you have created the Counter instance by passing the list of objects, you can do the same with the update method, and it will continue to count without discarding the old counters for the objects:

 >>> from collections import Counter >>> c = Counter(['hello', 'hello', 1]) >>> print c Counter({'hello': 2, 1: 1}) >>> c.update(['hello']) >>> print c Counter({'hello': 3, 1: 1})

Counting unique words in python

More articles: