Quick comparison of strings with a list

I need a quick method to determine if a given string is in the string list.

The list of strings is unknown before execution, but after that it will not change.

I could just have List<String>, called strings, and then do:

if (strings.Contains(item))

However, this will not work well if there are many lines in the list.

I could also use HashSet<String>, but this would require a call GetHashCodefor each incoming line, as well Equals, which would be empty if, for example, there are only 3 lines in the list. Did I mention it should be fast ?

When setting up, I could decide whether to use Listor HashSetdepending on the number of lines (for example, use a List of less than 10 lines, a HashSet otherwise), rather, as the logic in HybridDictionary.

Since strings are unicode, the standard Trie structure will not work, although the Radix / Patricia trie tree can. Are there any good C # implementations out there with benchmarks?

Some mentioned a workaround String GetHashCodeand used a faster hash function. Are there any benchmarks there?

Using LINQ expressions to create an optimized switch statement is a new approach that looks very interesting.

What else will work? The installation cost is not important, just the speed of the search.

If that matters, incoming line values ​​rarely appear in the list.

+5
8

HashSet ( .NET 3)?

+2

, " "; , System.Collections.Specialized.HybridDictionary - ; a System.Collections.Specialized.ListDictionary System.Collections.Hashtable, (>10). ?


; HashSet<T> ? , GetHashCode()...

using System;
using System.Collections.Generic;

class CustomStringComparer : IEqualityComparer<string> {
    public bool Equals(string x, string y) {
        return string.Equals(x, y);
    }
    public int GetHashCode(string s) {
        return string.IsNullOrEmpty(s) ? 0 :
            s.Length + 273133 * (int)s[0];
    }
    private CustomStringComparer() { }
    public static readonly CustomStringComparer Default
        = new CustomStringComparer();
}
static class Program {
    static void Main() {
        HashSet<string> set = new HashSet<string>(
            new string[] { "abc", "def", "ghi" }, CustomStringComparer.Default);
        Console.WriteLine(set.Contains("abc"));
        Console.WriteLine(set.Contains("abcde"));
    }
}
+2
+2

:

private static bool Contains(List<string> list, string value)
{
    bool contains = null != list.Find(str => str.ToLower().Equals(value.ToLower()));

    return contains;
}

, List<string>, .

+2

, , String, HashValue String . StringBuilder, , .

EDIT: ... Java String HashCode, # .

0

, . ( string.Intern()). object.ReferenceEquals - .

List<string> BuildList() {
    List<string> result;
    foreach (string str from StringSource())
         result.Add(str.Intern());
    return result;
}

bool CheckList(List<string> list, string stringToFind) { // list must be interned for this to work!
    return list.Find(str => object.ReferenceEquals(str, stringToFind)) != null;
}

This will result in a four-byte comparison for each list, and one will go over the original line. The internal string pool is built specifically for quick string comparisons and searches, if it already exists, so working with the Internet should be pretty fast.

0
source

All Articles