I am trying to optimize a concurrent collection that is trying to minimize read lock lock conflict. In the first pass, a linked list was used, which allowed me to lock records only if many simultaneous readings could continue to unlock. This used a custom IEnumerator to get the following link value. As soon as I started comparing collection iteration with a simple List<T> , I noticed that my implementation was about two times faster (for from x in c select x in a collection of 1 * m * elements, I got 24 ms for List<T> and 49ms for my collection).
So, I thought I would use ReaderWriteLockSlim and donate a bit of read disputes, so I could use List<T> as my internal storage. Since I have to grab the read lock on iterative start and release it upon completion, I first made a yield pattern for my IEnumerable by going through the internal List<T> . Now I was only getting 66 ms .
I looked at the list, in fact, and uses the internal T[] store and a custom IEnumerator that moves the index forward and returns the current index value. Now, using T[] , as storage means a lot more maintenance work, but wth, I chase microseconds.
However, even imitating IEnumerator , moving the index through the array, the best I could do was ~ 38 ms . So what gives List<T> its secret sauce, or, alternatively, what is a faster implementation for an iterator?
UPDATE: It turned out that my main speed culprit was doing Debug compilation, and List<T> is obviously Release compilation. In the release, my implementation is still slower than List<T> , although now it is mono more soon.
Another suggestion I received from a friend is that BCL is faster because it is in the GAC and therefore can be precompiled by the system. You have to do a test at the GAC to test this theory.