Whose responsibility is it to perform caching / memoize functions?

Question

Whose responsibility is it to perform caching / memoize functions?

I am working on software that allows the user to expand the system by implementing a set of interfaces.

To test the viability of what we do, my company “eats its dog food”, implementing all our business logic in these classes in the same way as a user.

We have some utility classes / methods that bind everything together and use the logic defined in extensible classes.

I want to cache the results of custom functions. Where should i do this?

Are these the classes themselves? It seems like this could lead to a lot of code duplication.
Is it a utility / engine that uses these classes? If so, an uninformed user can directly call a class function and not receive any caching benefits.

Code example

public interface ILetter { string[] GetAnimalsThatStartWithMe(); } public class A : ILetter { public string[] GetAnimalsThatStartWithMe() { return new [] { "Aardvark", "Ant" }; } } public class B : ILetter { public string[] GetAnimalsThatStartWithMe() { return new [] { "Baboon", "Banshee" }; } } /* ...Left to user to define... */ public class Z : ILetter { public string[] GetAnimalsThatStartWithMe() { return new [] { "Zebra" }; } } public static class LetterUtility { public static string[] GetAnimalsThatStartWithLetter(char letter) { if(letter == 'A') return (new A()).GetAnimalsThatStartWithMe(); if(letter == 'B') return (new B()).GetAnimalsThatStartWithMe(); /* ... */ if(letter == 'Z') return (new Z()).GetAnimalsThatStartWithMe(); throw new ApplicationException("Letter " + letter + " not found"); } }

Should LetterUtility be responsible for caching? Should each individual instance of ILetter? Is there anything else that can be done?

I am trying to make this example short, so these functions of the example do not require caching. But keep in mind that I'm adding this class, which forces (new C()).GetAnimalsThatStartWithMe() to take 10 seconds each time it starts:

 public class C : ILetter { public string[] GetAnimalsThatStartWithMe() { Thread.Sleep(10000); return new [] { "Cat", "Capybara", "Clam" }; } }

I am in a battle between how to make our software as fast as possible and maintain less code (in this example: cache the result in LetterUtility ) and do the same work again and again (in this example: wait 10 seconds each is used time C ).

+7

c # caching

Michael Nov 30 '11 at 15:57

source share

5 answers

Eric Lippert · Answer 1 · 2011-11-30T16:15:20+0000

Which layer is best responsible for caching the results of these user-defined functions?

The answer is pretty obvious: a layer that can correctly implement the required cache policy is the right layer.

A proper caching policy should have two characteristics:

It should never serve outdated data; he must know whether the caching method will produce a different result and the cache is invalidated at some point before the caller receives obsolete data
It must effectively manage caching resources on behalf of the user. A cache without an expiration policy that grows without restriction has a different name: we usually call them "memory leaks."

What layer in your system that knows the answers to the questions is "outdated cache"? and "too big a cache?" This is the layer that the cache should implement.

Jeff · Answer 2 · 2011-11-30T16:39:38+0000

Something like caching can be seen as a “cross-cutting” problem (http://en.wikipedia.org/wiki/Cross-cutting_concern):

In computer science, cross-cutting issues are aspects of a program that affect other issues. These problems often cannot be completely decomposed from the rest of the system both in design and in implementation and can lead to scattering (duplication of code), confusion (significant dependencies between systems), or both. For example, when writing an application for processing medical records, bookkeeping and indexing of such records are a major problem, and registering the change history in the records database or user database or in the authentication system will be a cross-cutting problem, since they concern more parts of the program.

Cross cutting problems can often be implemented using aspect-oriented programming (http://en.wikipedia.org/wiki/Aspect-oriented_programming).

In computing, aspect-oriented programming (AOP) is a programming paradigm that seeks to increase modularity by allowing cross-cutting issues to be separated. AOP is the foundation for developing aspect-oriented software.

There are many tools in .NET to facilitate aspect-oriented programming. Most of all I love those that provide a fully transparent implementation. In the caching example:

 public class Foo { [Cache(10)] // cache for 10 minutes public virtual void Bar() { ... } }

That's all you have to do ... everything else happens automatically, defining this behavior:

 public class CachingBehavior { public void Intercept(IInvocation invocation) { ... } // this method intercepts any method invocations on methods attributed with the [Cache] attribute. // In the case of caching, this method would check if some cache store contains the data, and if it does return it...else perform the normal method operation and store the result }

There are two general schools of how this happens:

Publish an IL-weave assembly. Tools such as PostSharp, Microsoft CCI, and Mono Cecil can be configured to automatically rewrite these attributes to automatically delegate your behavior.
Runtime proxies. Tools like Castle DynamicProxy and Microsoft Unity can automatically generate proxy types (a type derived from Foo that overrides Bar in the example above) that delegates your behavior.

parasietje · Answer 3 · 2011-11-30T16:14:11+0000

Although I don't know C #, this is similar to using AOP (Aspect-Oriented Programming). The idea is that you can "enter" code that must be executed at specific points in the execution stack.

The caching code can be added as follows:

 IF( InCache( object, method, method_arguments ) ) RETURN Cache(object, method, method_arguments); ELSE ExecuteMethod(); StoreResultsInCache();

Then you determine that this code must be executed before each call to your interface functions (and all subclasses that implement these functions).

Can someone .NET expert tell us how you do it in .NET?

Jon hanna · Answer 4 · 2011-11-30T16:22:55+0000

In general, caching and memoisation makes sense when:

Obtaining a result (or at least it can be) with a high delay or otherwise is expensive than the costs caused by the caching itself.
The results have a search template where there will be frequent calls with the same function inputs (that is, not only arguments, but also any instance, static and other data that affect the result).
In the code that invokes the code, there is no existing caching mechanism, which makes it unnecessary.
The code that calls this code will have no other caching mechanism that makes it unnecessary (why memoise GetHashCode() almost never makes sense in this method, even though people are often tempted when the implementation is relatively expensive).
It is impossible to become obsolete, it is unlikely to become obsolete when the cache is loaded, it does not matter if it becomes obsolete, or where persistence is easily detected.

There are times when each use case for a component will correspond to all of these. There are many more where they will not be. For example, if a component caches results, but is never called twice with the same inputs of a specific client component, then this caching is just waste that negatively impacted performance (possibly minor, maybe serious).

Most often, it makes sense for client code to define a caching policy that suits it. It is also often easier to configure for a specific use at the moment in the face of real data than in a component (since the real-world data it will encounter can differ significantly from the use for use).

It is even more difficult to find out what degree of resistance may be acceptable. Typically, a component should assume that 100% freshness is required from it, while a client component may know that a certain amount of inaccuracy will be in order.

On the other hand, it may be easier for a component to retrieve information that is used in the cache. Components can work hand in hand in these cases, although it is much more active (an example is the If-Modified-Since mechanism used by RESTful web services, where the server can indicate that the client can safely use the information it cached).

Additionally, the component may have a custom cache policy. Connection pooling is a caching policy; consider how it is configured.

So in short:

A component that can decide which caching is possible and useful.

Most often this is client code. Although the details of the likely delay and inaccuracies documented by the authors of the component will help here.

Less commonly can be client code using a component, although you need to cache the details to enable this.

And sometimes it can be a component with a caching policy configured by the calling code.

Rarely can there be only a component, because the same caching policy is less commonly used for all possible use cases. One of the important exceptions is that the same instance of this component will serve several clients, because then the factors that affect the above apply to these several clients.

Sean thoman · Answer 5 · 2011-11-30T16:51:25+0000

All previous posts have raised some good points, here is a very crude outline of how you can do this. I wrote this on the fly, so some settings may be required:

 interface IMemoizer<T, R> { bool IsValid(T args); //Is the cache valid, or stale, etc. bool TryLookup(T args, out R result); void StoreResult(T args, R result); } static IMemoizerExtensions { Func<T, R> Memoizing<T, R>(this IMemoizer src, Func<T, R> method) { return new Func<T, R>(args => { R result; if (src.TryLookup(args, result) && src.IsValid(args)) { return result; } else { result = method.Invoke(args); memoizer.StoreResult(args, result); return result; } }); } }

Whose responsibility is it to perform caching / memoize functions?

Code example

More articles: