More recently, I had the same problem. Here is what I did:
I created a class to store only the identifier and text for each object (in my case, I called it sku (position number) and description). This creates a smaller object that uses less memory because it is used only for search. I still get full-blown objects from the database after I find matches.
public class SmallItem { private int _sku; public int Sku { get { return _sku; } set { _sku = value; } }
After creating this class, you can create an array (I actually used a list in my case) of these objects and use it to search in your application. Initializing this list takes a little time, but you only need to worry about it at startup. Basically, just run a query in your database and grab the data needed to create this list.
Once you have a list, you can quickly go through it to search for any words you want. Since it contains, it should also find words in words (for example, a drill will return a drill, drill, drill, etc.). To do this, we wrote a home, unmanaged C # function. It takes a string array of words (so you can search for more than one word ... we use it to search for "AND" ... the description should contain all the words passed in ... "OR" is not currently supported in this example ) As you search through the list of words, it builds a list of identifiers, which are then passed back to the calling function. Once you have a list of identifiers, you can easily run a quick query in your database to return full-blown objects based on a fast indexed identification number. It should be noted that we also limit the maximum number of returned results. This could be removed. It is just convenient if someone enters something like "e" as a search term. This will produce many results.
Here's an example of a custom function Contains:
public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList) { // Don't allow more than the maximum allowable results constant. int[] matchingSkus = new int[maxResults]; // Indexes and counters. int matchNumber = 0; int currentWord = 0; int totalWords = descriptionTerms.Count() - 1; // - 1 because it will be used with 0 based array indexes bool matchedWord; try { /* Character array of character arrays. Each array is a word we want to match. * We need the + 1 because totalWords had - 1 (We are setting a size/length here, * so it is not 0 based... we used - 1 on totalWords because it is used for 0 * based index referencing.) * */ char[][] allWordsToMatch = new char[totalWords + 1][]; // Character array to hold the current word to match. char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size. // Loop through the original string array or words to match and create the character arrays. for (currentWord = 0; currentWord <= totalWords; currentWord++) { char[] desc = new char[descriptionTerms[currentWord].Length + 1]; Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length); allWordsToMatch[currentWord] = desc; } // Offsets for description and filter(word to match) pointers. int descriptionOffset = 0, filterOffset = 0; // Loop through the list of items trying to find matching words. foreach (SmallItem i in itemList) { // If we have reached our maximum allowable matches, we should stop searching and just return the results. if (matchNumber == maxResults) break; // Loop through the "words to match" filter list. for (currentWord = 0; currentWord <= totalWords; currentWord++) { // Reset our match flag and current word to match. matchedWord = false; wordToMatch = allWordsToMatch[currentWord]; // Delving into unmanaged code for SCREAMING performance ;) unsafe { // Pointer to the description of the current item on the list (starting at first char). fixed (char* pdesc = &i.Description[0]) { // Pointer to the current word we are trying to match (starting at first char). fixed (char* pfilter = &wordToMatch[0]) { // Reset the description offset. descriptionOffset = 0; // Continue our search on the current word until we hit a null terminator for the char array. while (*(pdesc + descriptionOffset) != '\0') { // We've matched the first character of the word we're trying to match. if (*(pdesc + descriptionOffset) == *pfilter) { // Reset the filter offset. filterOffset = 0; /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match * or a null terminator, we need to jump out of this loop. * */ while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset)) { // Increase the offsets together to the next character. ++filterOffset; ++descriptionOffset; } // We hit matches all the way to the null terminator. The entire word was a match. if (*(pfilter + filterOffset) == '\0') { // If our current word matched is the last word on the match list, we have matched all words. if (currentWord == totalWords) { // Add the sku as a match. matchingSkus[matchNumber] = i.Sku.ToString(); matchNumber++; /* Break out of this item description. We have matched all needed words and can move to * the next item. * */ break; } /* We've matched a word, but still have more words left in our list of words to match. * Set our match flag to true, which will mean we continue continue to search for the * next word on the list. * */ matchedWord = true; } } // No match on the current character. Move to next one. descriptionOffset++; } /* The current word had no match, so no sense in looking for the rest of the words. Break to the * next item description. * */ if (!matchedWord) break; } } } } }; // We have our list of matching skus. We'll resize the array and pass it back. Array.Resize(ref matchingSkus, matchNumber); return matchingSkus; } catch (Exception ex) { // Handle the exception } }
Once you have a list of matching skus, you can iterate through the array and build a query command that returns only the corresponding skus.
To get an idea of performance, here's what we found (by following these steps):
- Search ~ 171,000 items
- Create a list of all matching items
- Query database by returning only matching items
- Build full-blown elements (similar to SmallItem class, but much more fields)
- Fills a datagrid with full hit objects.
On our mobile devices, the whole process takes 2-4 seconds (it takes 2 if we press the match limit before we check all the items ... takes 4 seconds if we need to scan each item).
I also tried doing this without unmanaged code and using String.IndexOf (and tried String.Contains ... had the same performance as IndexOf). This path was much slower ... about 25 seconds.
I also tried using StreamReader and a file containing strings [Sku Number] | [Description]. The code was like an unmanaged code example. This path took about 15 seconds to complete a scan. Not so bad for speed, but not really. The file and StreamReader method have one advantage over what I showed you. A file can be created in advance. As I showed you, you need memory and the initial loading time of the list when the application starts. For our 171,000 items, this takes about 2 minutes. If you can afford to wait for this bootstrap every time the application starts (which can be done on a separate thread, of course), searching this path is the fastest way (which I found at least).
Hope this helps.
PS - Thanks to Dolch for helping with some unmanaged code.