DataTable.Select vs DataTable.rows.Find vs foreach vs Find (Predicate <T>) / Lambda

I have a DataTable / collection that is cached in memory, I want to use this as a source to generate results for an automatic full text field (using AJAX, of course). I appreciate the various options for quickly retrieving data. The number of elements in the collection / rows in a datatable can vary from 10,000 to 2,000,000. (So that we are not distracted, at the moment we assume that the decision has been made, I have enough RAM, and I will use the cache, not the database query, for this )

I have additional business logic for this processing; I have to determine the priority for the automatic complete list according to the priority (int) column in the collection. Therefore, if I am looking for someone Micro , and I get 20 results for words / sentences starting with Micro , then I would choose the 10 best result elements with the highest priority. (therefore, it is necessary to have a priority property associated with the string value).

Items in the collection are already sorted alphabetically.

What would be the best solution in this case.
1. Using DataTable.Select (.
2. Using DataTable.Rows.Find ( .
3. Use a custom collection with foreach or to repeat your values.
4. use a shared collection with anonymous delegates or lambda ( since both give the same performance or not ?)

+19
optimization lambda anonymous-methods
Mar 09 '09 at 15:25
source share
5 answers

Charts are not posted on my blog post; more information can be found at http://msdn.microsoft.com/en-us/library/dd364983.aspx

Another thing I have since discovered is that for large datasets, using an integer generic dictionary does incredibly well. It also helps eliminate many of the problems caused by the sort operations needed for aggregation operations, such as min and max (either with DataTable.Compute or LINQ ).

In "encoded generic dictionary" I mean Dictionary(Of String, Dictionary(Of String, Dictionary(Of Integer, List(Of DataRow)))) or a similar method, where the key for each dictionary is a search term.

Of course, this will not be useful in any circumstances, but I have at least one scenario in which the implementation of this approach leads to an improvement in 500x performance.

In your case, I would use a simple dictionary with the first 1-5 characters, and then List(Of String) . You will need to create this dictionary once by adding words to the lists with the first 1-5 characters, but after that you can get incredibly fast results.

I usually wrap such things in class, which allows me to do things like adding words easily. You can also use SortedList(Of String) to automatically sort the results. Thus, you can quickly find a list of words corresponding to the first N characters that were printed.

+8
Jun 11 '09 at 15:57
source share

On my autocomplete , I first tried using the linq/lambda approach, performance is a bit slow. DataTable.Select faster than linq , so I use this. I have not yet compared the performance between DataTable.Select and datatable.Find

+4
Mar 16 '09 at 15:33
source share

We can think about it all day, but since this is not a huge piece of code, why not write each of them and compare them with each other?

 public delegate void TestProcedure(); public TimeSpan Benchmark(TestProcedure tp) { int testBatchSize = 5; List<TimeSpan> results = new List<TimeSpan>(); for(int i = 0; i<testBatchSize; i++) { DateTime start = DateTime.Now; tp(); results.Add(DateTime.Now - start); } return results.Min(); } 
+2
Mar 09 '09 at 15:33
source share

according to the next blog

http://blog.dotnetspeech.net/archive/2008/08/26/performance----datatable.select-vs-dictionary.aspx

DataTable.Rows.Find is much, much faster than DataTable.Select.

+1
Jun 09 '09 at 20:54
source share

How about a DataView? You can apply the filter condition and sort by priority and iterate over the results easily to add to your results.

0
Mar 09 '09 at 15:40
source share



All Articles