The difficulty of using binary search and Trie

given the large list of alphabetically sorted words in the file, I need to write a program that, given the word x, determines if x is in the list. The preprocessing is fine, as I will call this function many times on different inputs.
priorties: 1. speed. 2. memory

I already know that I can use (n is the number of words, m is the average word length) 1. trie, time O (log (n)), space (best case) - O (log (nm)), space (worst case ) - O (nm).
2. Load the complete list into memory, then binary search, time O (log (n)), space O (n * m)

I'm not sure about the complexity on tri, please correct me if they are wrong. Also have other good approaches?

+2
algorithm complexity-theory data-structures
Apr 27 '10 at 4:51
source share
5 answers

This time is O (m) for trie and before O (mlog (n)) for binary search. The space is asymptotically O (nm) for any reasonable method, which you can probably reduce in some cases using compression. Theoretically, this structure is slightly better in memory, but in practice, it has features hidden in the implementation details: the memory needed to store pointers and potentially poor access to the cache.

There are other options for implementing a given structure: hashset and treeet are a simple choice in most languages. I would choose a hash set as it is efficient and simple.

+3
Apr 27 '10 at 6:19 06:19
source share

I think HashMap is great for your case, since the time complexity for put and get operations is O (1). It works fine even if you don't have a sorted list. !!!

+1
Jan 12 '13 at 6:11
source share

The preprocessing is fine, as I will call> this function many times on different inputs.

As food for thought, do you consider creating a set of input and then searching using a specific hash? It will take longer to create the set, but if the number of inputs is limited, and you can return to them, then setting up might be a good idea with O (1) for the β€œcontains” operation for a good hash function.

0
Apr 27 '10 at 6:04 on
source share

I would recommend hashmap. You can find the extension for C ++ for this in both VC and GCC.

0
May 15 '10 at 20:56
source share

Use a flowering filter. This space is effective even for very large data, and it is a quick reject method.

0
Feb 16 '14 at 20:06
source share



All Articles