DataSet Row / Column Search Speed?

I recently had to handle heavy things with data stored in a DataSet. It was pretty hard, and I ended up using a tool to help identify some bottlenecks in my code. When I analyzed bottlenecks, I noticed that although the DataSet search was not terribly slow (they were not a bottleneck), it was slower than I expected. I always assumed that DataSets used some kind of HashTable style implementation that would create an O (1) search (or at least what I consider HashTables). The speed of my searches looked much slower than that.

I was wondering if anyone who knows anything about implementing the .NET DataSet class will be able to share what they know.

If I do something like this:

DataTable dt = new DataTable(); if(dt.Columns.Contains("SomeColumn")) { object o = dt.Rows[0]["SomeColumn"]; } 

How fast will the search time be for the Contains(...) method and to get the value to store in Object o ? I would think this is very fast, like a HashTable (assuming I understand that HashTables is true), but it doesn't look like this ...

I wrote this code from memory, so some things may not be “syntactically correct”.

+6
optimization c # datatable
source share
4 answers

Via the Reflector steps for the DataRow ["ColumnName"]:

  • Get DataColumn from ColumnName. Uses the string DataColumnCollection ["ColumnName"]. Internally, the DataColumnCollection stores its DataColumns in a Hastable. O (1)
  • Get the index of a row in a DataRow. The index is stored in the inner element. O (1)
  • Get the value of the DataColumn in the index using the DataColumn [index]. The DataColumn stores its data in the System.Data.Common.DataStorage member (internal, abstract):

    return dataColumnInstance._storage.Get (recordIndex);

    An example of a specific implementation is System.Data.Common.StringStorage (internal, sealed). StringStorage (and other specific DataStorages that I checked) store their values ​​in an array. Get (recordIndex) just grabs the object in the array of values ​​in recordIndex. O (1)

So you are still O (1), but that doesn’t mean that calling a hash and function during an operation is cost-free. It just means that it will not cost more as the number of DataRows or DataColumns increases.

Interestingly, DataStorage uses an array for values. I can’t imagine that it’s easy to restore when adding or deleting rows.

+2
source share

It’s actually advisable to use an integer when referring to a column, which can significantly improve performance. To maintain manageability, you can declare a constant integer. So instead of what you did, you could do

 const int SomeTable_SomeColumn = 0; DataTable dt = new DataTable(); if(dt.Columns.Contains(SomeTable_SomeColumn)) { object o = dt.Rows[0][SomeTable_SomeColumn]; } 
+3
source share

I assume that any search will be O (n), since I don't think they will use any type of hash table, but will actually use a larger array to search for rows and columns.

0
source share

In fact, I believe that column names are stored in a Hashtable. Must be O (1) or constant search for case sensitive queries. If he had to look through each, then, of course, it would be O (n).

0
source share

All Articles