Full-text search on a mobile device?

Question

Full-text search on a mobile device?

We will soon begin to develop a new mobile application. This particular application will be used for intensive search of text fields. Any suggestions from the group as a whole about which database engine is best for resolving these types of searches on a mobile platform?

Features include Windows Mobile 6, and we will use .Net CF. In addition, some text fields will contain between 35 and 500 characters. The device will work in two different modes: burst and WiFi. Of course, for Wi-Fi, we can simply send requests to the full-blown database engine and simply return the results back. This question is centered around the "batch" version, which will host a database loaded with information about flash memory / removable memory card devices.

In any case, I know that SQLCE has some basic indexing, but you do not get into these fantastic full-text style indexes until you get the full version, which, of course, is not available on the mobile platform.

An example of how the data will look:

"apron carpenter adjustable leather container pocket waist hardware belt" etc. etc.

I have not yet evaluated any other specific parameters, as I believe that I would use the experience of this group to first show me some specific possibilities.

Any suggestions / tips?

+4

sqlite search windows-mobile sql-server-ce full-text-search

Mat nadrofsky Nov 09 '08 at 21:20

source share

2 answers

You can try Lucene.Net. I'm not sure how well it works for mobile devices, but it is declared as a "high-performance, full-featured text search library."

http://incubator.apache.org/lucene.net/ http://lucene.apache.org/java/docs/

+2

Wayne bloss Nov 09 '08 at 21:59

source share

Jason down · Accepted Answer · 2009-01-16T19:41:03+0000

More recently, I had the same problem. Here is what I did:

I created a class to store only the identifier and text for each object (in my case, I called it sku (position number) and description). This creates a smaller object that uses less memory because it is used only for search. I still get full-blown objects from the database after I find matches.

public class SmallItem { private int _sku; public int Sku { get { return _sku; } set { _sku = value; } } // Size of max description size + 1 for null terminator. private char[] _description = new char[36]; public char[] Description { get { return _description; } set { _description = value; } } public SmallItem() { } }

After creating this class, you can create an array (I actually used a list in my case) of these objects and use it to search in your application. Initializing this list takes a little time, but you only need to worry about it at startup. Basically, just run a query in your database and grab the data needed to create this list.

Once you have a list, you can quickly go through it to search for any words you want. Since it contains, it should also find words in words (for example, a drill will return a drill, drill, drill, etc.). To do this, we wrote a home, unmanaged C # function. It takes a string array of words (so you can search for more than one word ... we use it to search for "AND" ... the description should contain all the words passed in ... "OR" is not currently supported in this example ) As you search through the list of words, it builds a list of identifiers, which are then passed back to the calling function. Once you have a list of identifiers, you can easily run a quick query in your database to return full-blown objects based on a fast indexed identification number. It should be noted that we also limit the maximum number of returned results. This could be removed. It is just convenient if someone enters something like "e" as a search term. This will produce many results.

Here's an example of a custom function Contains:

 public static int[] Contains(string[] descriptionTerms, int maxResults, List<SmallItem> itemList) { // Don't allow more than the maximum allowable results constant. int[] matchingSkus = new int[maxResults]; // Indexes and counters. int matchNumber = 0; int currentWord = 0; int totalWords = descriptionTerms.Count() - 1; // - 1 because it will be used with 0 based array indexes bool matchedWord; try { /* Character array of character arrays. Each array is a word we want to match. * We need the + 1 because totalWords had - 1 (We are setting a size/length here, * so it is not 0 based... we used - 1 on totalWords because it is used for 0 * based index referencing.) * */ char[][] allWordsToMatch = new char[totalWords + 1][]; // Character array to hold the current word to match. char[] wordToMatch = new char[36]; // Max allowable word size + null terminator... I just picked 36 to be consistent with max description size. // Loop through the original string array or words to match and create the character arrays. for (currentWord = 0; currentWord <= totalWords; currentWord++) { char[] desc = new char[descriptionTerms[currentWord].Length + 1]; Array.Copy(descriptionTerms[currentWord].ToUpper().ToCharArray(), desc, descriptionTerms[currentWord].Length); allWordsToMatch[currentWord] = desc; } // Offsets for description and filter(word to match) pointers. int descriptionOffset = 0, filterOffset = 0; // Loop through the list of items trying to find matching words. foreach (SmallItem i in itemList) { // If we have reached our maximum allowable matches, we should stop searching and just return the results. if (matchNumber == maxResults) break; // Loop through the "words to match" filter list. for (currentWord = 0; currentWord <= totalWords; currentWord++) { // Reset our match flag and current word to match. matchedWord = false; wordToMatch = allWordsToMatch[currentWord]; // Delving into unmanaged code for SCREAMING performance ;) unsafe { // Pointer to the description of the current item on the list (starting at first char). fixed (char* pdesc = &i.Description[0]) { // Pointer to the current word we are trying to match (starting at first char). fixed (char* pfilter = &wordToMatch[0]) { // Reset the description offset. descriptionOffset = 0; // Continue our search on the current word until we hit a null terminator for the char array. while (*(pdesc + descriptionOffset) != '\0') { // We've matched the first character of the word we're trying to match. if (*(pdesc + descriptionOffset) == *pfilter) { // Reset the filter offset. filterOffset = 0; /* Keep moving the offsets together while we have consecutive character matches. Once we hit a non-match * or a null terminator, we need to jump out of this loop. * */ while (*(pfilter + filterOffset) != '\0' && *(pfilter + filterOffset) == *(pdesc + descriptionOffset)) { // Increase the offsets together to the next character. ++filterOffset; ++descriptionOffset; } // We hit matches all the way to the null terminator. The entire word was a match. if (*(pfilter + filterOffset) == '\0') { // If our current word matched is the last word on the match list, we have matched all words. if (currentWord == totalWords) { // Add the sku as a match. matchingSkus[matchNumber] = i.Sku.ToString(); matchNumber++; /* Break out of this item description. We have matched all needed words and can move to * the next item. * */ break; } /* We've matched a word, but still have more words left in our list of words to match. * Set our match flag to true, which will mean we continue continue to search for the * next word on the list. * */ matchedWord = true; } } // No match on the current character. Move to next one. descriptionOffset++; } /* The current word had no match, so no sense in looking for the rest of the words. Break to the * next item description. * */ if (!matchedWord) break; } } } } }; // We have our list of matching skus. We'll resize the array and pass it back. Array.Resize(ref matchingSkus, matchNumber); return matchingSkus; } catch (Exception ex) { // Handle the exception } }

Once you have a list of matching skus, you can iterate through the array and build a query command that returns only the corresponding skus.

To get an idea of performance, here's what we found (by following these steps):

Search ~ 171,000 items
Create a list of all matching items
Query database by returning only matching items
Build full-blown elements (similar to SmallItem class, but much more fields)
Fills a datagrid with full hit objects.

On our mobile devices, the whole process takes 2-4 seconds (it takes 2 if we press the match limit before we check all the items ... takes 4 seconds if we need to scan each item).

I also tried doing this without unmanaged code and using String.IndexOf (and tried String.Contains ... had the same performance as IndexOf). This path was much slower ... about 25 seconds.

I also tried using StreamReader and a file containing strings [Sku Number] | [Description]. The code was like an unmanaged code example. This path took about 15 seconds to complete a scan. Not so bad for speed, but not really. The file and StreamReader method have one advantage over what I showed you. A file can be created in advance. As I showed you, you need memory and the initial loading time of the list when the application starts. For our 171,000 items, this takes about 2 minutes. If you can afford to wait for this bootstrap every time the application starts (which can be done on a separate thread, of course), searching this path is the fastest way (which I found at least).

Hope this helps.

PS - Thanks to Dolch for helping with some unmanaged code.

Full-text search on a mobile device?

More articles: