Performance regarding the cached implementation of IEnumerable <T>
[EDIT]
The new Reactive Framework solves the problem described below using the System.Linq.EnumerableEx.MemoizeAll() extension method.
Inside MemoizeAll() , System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly) is used, which is similar to my ThreadSafeCachedEnumerable<T> (sorta).
Here's a terribly far-fetched example that prints Enumerable content very slowly (numbers 1-10) and then quickly prints the content a second time (because it cached values):
// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; }); // This decorates the slow enumerable with one that will cache each value. var cachedEnum = slowEnum.MemoizeAll(); // Print the numbers foreach (var num in cachedEnum.Repeat(2)) { Console.WriteLine(num); } [/ EDIT]
Hi multithreaded gurus
I created the ThreadSafeCachedEnumerable class, intending to increase performance when long-running queries that are used repeatedly. The idea was to get an enumerator from IEnumerable and add items to the cache each time MoveNext () was called. Below is my current implementation:
/// <summary> /// Wraps an IEnumerable<T> and provides a thread-safe means of caching the values."/> /// </summary> /// <typeparam name="T"></typeparam> class ThreadSafeCachedEnumerable<T> : IEnumerable<T> { // An enumerator from the original IEnumerable<T> private IEnumerator<T> enumerator; // The items we have already cached (from this.enumerator) private IList<T> cachedItems = new List<T>(); public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable) { this.enumerator = enumerable.GetEnumerator(); } #region IEnumerable<T> Members public IEnumerator<T> GetEnumerator() { // The index into the sequence int currentIndex = 0; // We will break with yield break while (true) { // The currentIndex will never be decremented, // so we can check without locking first if (currentIndex < this.cachedItems.Count) { var current = this.cachedItems[currentIndex]; currentIndex += 1; yield return current; } else { // If !(currentIndex < this.cachedItems.Count), // we need to synchronize access to this.enumerator lock (enumerator) { // See if we have more cached items ... if (currentIndex < this.cachedItems.Count) { var current = this.cachedItems[currentIndex]; currentIndex += 1; yield return current; } else { // ... otherwise, we'll need to get the next item from this.enumerator.MoveNext() if (this.enumerator.MoveNext()) { // capture the current item and cache it, then increment the currentIndex var current = this.enumerator.Current; this.cachedItems.Add(current); currentIndex += 1; yield return current; } else { // We reached the end of the enumerator - we're done yield break; } } } } } } #endregion #region IEnumerable Members System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return this.GetEnumerator(); } #endregion } I just "block (this.enumerator)" when there are no more elements in the cache, just in case another thread just adds another element (I assume that calling MoveNext () on this.enumerator from two threads is a bad idea).
Performance is excellent when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to constant blocking). Any suggestions for improving performance?
Thanks!
A few recommendations:
- At present, it is generally accepted practice not to create container classes responsible for locking. For example, someone calling your cached enumerator may also prohibit adding new entries to the container during enumeration, which means that the lock will happen twice. Therefore, it is better to defer this responsibility to the caller.
- Your caching depends on an enumerator that always returns items in order, which is not guaranteed. Better to use
DictionaryorHashSet. Similarly, items can be deleted between calls that have an invalid cache. - It is generally not recommended to set locks on public objects. This includes a wrapped enumerator. Exceptions are possible, for example, when you are absolutely sure that you are absolutely sure that you are the only instance containing a reference to the container class that you are listing. It will also greatly nullify my objection under No. 2.
Blocking in .NET is usually very fast (if there is no competition). Has profiling identified locking as the source of a performance problem? How long does it take to call MoveNext in the base enumerator?
In addition, the code in its current form is not thread safe. You cannot safely call this.cachedItems[currentIndex] in one thread (in if (currentIndex < this.cachedItems.Count) ) when calling this.cachedItems.Add(current) on another. From List (T) documentation : "List (T) can support multiple readers at the same time, until the collection changes." To be thread safe, you need to protect all access to this.cachedItems with a lock (if there is a possibility that one or more threads can change it).