Develop an algorithm, find the most frequently used word in a book

Question

Develop an algorithm, find the most frequently used word in a book

Question from the interview:

Find the most commonly used word in the book.

My idea:

Use a hash table, navigate and mark the hash table.

If the book size is known, if any word> 50% is found, then skip any new words in the next round and just read the old words. What if the size of the book is unknown?

This is O (n) and O (n) time and space.

Any better ideas?

thank

+5

python algorithm data-structures hash

user1002288 Jan 6 '12 at 17:27

source share

6 answers

Abhijit · Answer 1 · 2012-01-06T17:33:08+0000

Usually Heap is a data structure that works well when we need to define something like most / least used.

Python; Counter.nlargest, , .

CreateHeap - O(1)
FindMin - O(1)
deleteMin - O(logn)
Insert - O(logn)

Hash ( Python) Heap ( Collections.Counter.nlargest python), , .

>>> stmt1="""
import collections, random
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.defaultdict(int)
for d in somedata:
    somehash[d]+=1
maxkey=0
for k,v in somehash.items():
    if somehash[maxkey] > v:
        maxkey=k
"""
>>> stmt2="""
import collections,random
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
collections.Counter(somedata).most_common(1)
"""
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=10)/10)
38168.96 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=10)/10)
33600.80 usec/pass

Shane MacLaughlin · Answer 2 · 2012-01-06T17:58:22+0000

, , , n = , m = . , O (n log (m)) O (m) , , n , - , m .

thekoalaz · Answer 3 · 2012-01-06T18:13:40+0000

map .

wikipedia , , ( - concurrency).

, , -.

Chris Shain · Answer 4 · 2012-01-06T17:34:59+0000

: , , , > + , .

Spike · Answer 5 · 2012-01-06T18:07:08+0000

, , , / .

, . , , O (n). - O (1), n , O (n). max - O (n). O (n), .

, , Chris, - , O (1) .

, . - O (log (n)), O (nlog (n)).

Don reba · Answer 6 · 2012-01-06T21:17:20+0000

, . , , .

- k . - O (k) O (1) .

For unusual words, I would use a priority queue implemented as a heap or a tree with self-balancing. The right hash table could also be a good choice.

Develop an algorithm, find the most frequently used word in a book

More articles: