Develop an algorithm, find the most frequently used word in a book

Question from the interview:

Find the most commonly used word in the book.

My idea:

Use a hash table, navigate and mark the hash table.

If the book size is known, if any word> 50% is found, then skip any new words in the next round and just read the old words. What if the size of the book is unknown?

This is O (n) and O (n) time and space.

Any better ideas?

thank

+5
source share
6 answers

Usually Heap is a data structure that works well when we need to define something like most / least used.

Python; Counter.nlargest, , .

CreateHeap - O(1)
FindMin - O(1)
deleteMin - O(logn)
Insert - O(logn)

Hash ( Python) Heap ( Collections.Counter.nlargest python), , .

>>> stmt1="""
import collections, random
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
somehash=collections.defaultdict(int)
for d in somedata:
    somehash[d]+=1
maxkey=0
for k,v in somehash.items():
    if somehash[maxkey] > v:
        maxkey=k
"""
>>> stmt2="""
import collections,random
somedata=[random.randint(1,1000) for i in xrange(1,10000)]
collections.Counter(somedata).most_common(1)
"""
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=10)/10)
38168.96 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=10)/10)
33600.80 usec/pass
+2

, , , n = , m = . , O (n log (m)) O (m) , , n , - , m .

+2

map .

wikipedia , , ( - concurrency).

, , -.

+2

: , , , > + , .

+1

, , , / .

, . , , O (n). - O (1), n , O (n). max - O (n). O (n), .

, , Chris, - , O (1) .

, . - O (log (n)), O (nlog (n)).

+1

, . , , .

- k . - O (k) O (1) .

For unusual words, I would use a priority queue implemented as a heap or a tree with self-balancing. The right hash table could also be a good choice.

0
source

All Articles