Efficient way to find a string in a string list?

I have a list of strings and need to find which strings match the given input value. What is the most efficient way (memory and execution speed) for me to save this list of strings and be able to search through it? Starting and loading a list of strings is not important, but the response time for the search.

Should I use List or HashSet or just the base string [] or something else?

+7
source share
4 answers

This greatly depends on the nature of the strings and the size of the collection. Depending on the characteristics of the collection and the expected search strings, there are ways to arrange things very cleverly so that the search is very fast. You have not provided us with this information.

But here is what I will do. I would set reasonable performance requirements. Then I would try the n-gram index (why? Because you said in the comment that you need to consider partial matches, the HashSet<string> will not help you), and I would present the reasonable materials that I expect from this solution and see whether it meets my performance requirements or not. If so, I agree with the decision and continue. If this does not happen, I would very carefully think about whether my performance requirements are reasonable. If they are, I would start to think about whether there is anything special in my investments and assemblies that could allow me to use even smarter solutions.

+10
source

It seems best to build a suffix tree of your input in O (input_len) and then query your patterns in O (pattern_length). Therefore, if your text is really large compared to your templates, this will work well.

See the Ukkonen algorithm for constructing a suffix tree.

If you want an inaccurate match ... see Gonzalo Navarro's work.

+4
source

Use Dictionary<string>() or HashSet<string> is probably good for you.

+1
source

Vocabulary and Hashtable will be the fastest in the "search", because it is the speed of O (1). There are some problems with dictionaries and Hashtables in that they are not sorted.

Using the binary search tree, you can get an O (Log N) search.

Using an unsorted list, you will have an O (N) speed to search.

Using a sorted list, you get an O (Log N) search, but remember that the list needs to be sorted to add time to the overall speed.

Regarding memory usage, just make sure you initialize the collection size.

Therefore, a dictionary or hash table is the fastest to search.

Speed ​​classification from best to worst O (1) O (log n) On) O (n log n) O (N ^ 2) O (2 ^ n)

n is the number of elements.

-one
source

All Articles