Practical (Django) strategy and implementation of caching? Cache long, Invalidate cache when changing data

I have a Django app that receives real-time data (tweets and voices), although updates only happen every minute or two on average. However, we want to show data by updating the results of the site and api when they appear.

We could see a whole bunch of downloads on this site, so my initial thought, of course, is cached!

How practical is it to have some kind of Memcached cache that becomes manually invalidated by another process or event? In other words, I will cache the views for a long time, and then new tweets and voices will cancel the whole view.

  • Is it possible that modest performance improvements justify the added complexity?
  • Is there a practical implementation that I could create (I work with other developers, so hacking tons of things around each answer is not a good option)?

I am not worried about the invalidity of only certain objects, and I considered subclassing the MemcachedCache backend to add some functionality after this strategy . But of course, Django sessions also use Memcached as a write cache, and I don't want to cancel this.

+7
source share
2 answers

Thanks to @rdegges suggestions , I was able to find a great way to do this.

I follow this paradigm:

  • Cached template fragments and API calls for five minutes (or longer)
  • Invalid cache every time new data is added.
    • Simple cache invalidation is better than deletion when saving, because new cached data is generated automatically and organically when no cached data is found.
  • Manually cancel the cache after I have done a full update (say, from a tweet search), and not for every save of the object.
    • This can lead to invalidation of caching several times, but on the other hand it is not so automatic.

Here is all the code you need to do as follows:

 from django.conf import settings from django.core.cache import get_cache from django.core.cache.backends.memcached import MemcachedCache from django.utils.encoding import smart_str from time import time class NamespacedMemcachedCache(MemcachedCache): def __init__(self, *args, **kwargs): super(NamespacedMemcachedCache, self).__init__(*args, **kwargs) self.cache = get_cache(getattr(settings, 'REGULAR_CACHE', 'regular')) self.reset() def reset(self): namespace = str(time()).replace('.', '') self.cache.set('namespaced_cache_namespace', namespace, 0) # note that (very important) we are setting # this in the non namespaced cache, not our cache. # otherwise stuff would get crazy. return namespace def make_key(self, key, version=None): """Constructs the key used by all other methods. By default it uses the key_func to generate a key (which, by default, prepends the `key_prefix' and 'version'). An different key function can be provided at the time of cache construction; alternatively, you can subclass the cache backend to provide custom key making behavior. """ if version is None: version = self.version namespace = self.cache.get('namespaced_cache_namespace') if not namespace: namespace = self.reset() return ':'.join([self.key_prefix, str(version), namespace, smart_str(key)]) 

This works by installing a version or namespace on each entry in the cache and storing that version in the cache . Version is only the current era when reset() called.

You must specify your alternative cache without labels using settings.REGULAR_CACHE , so the version number can be stored in the cache without names (so that it will not be recursive!).

Whenever you add a bunch of data and want to clear the cache (if you set this as the default cache), just do:

 from django.core.cache import cache cache.clear() 

You can access any cache with:

 from django.core.cache import get_cache some_cache = get_cache('some_cache_key') 

Finally, I recommend that you do not put the session in this cache. You can use this method to change the cache key for your session. (As settings.SESSION_CACHE_ALIAS ).

+4
source

Invalid cache is probably the best way to handle what you are trying to do. Based on your wording of the question, I'm going to accept the following about your application:

  • You have some kind of API in place that receives new informational updates and DOES NOT do the survey. EG: every minute or two you get an API request and you store some information in your database.
  • You are already using Memcached for read caching. Perhaps through a cronjob or similar process that periodically scans your database and updates your cache.

Assuming the above two things are true, cache invalidation is definitely the way to go. Here is the best way to do this in Django:

  • Your server includes a new API request containing new data to store. You save it in the database and use the save signal post in your model class (EG: Tweet, Poll, etc.) to update memcached data.
  • A user visits your site and asks to read their latest tweets, polls, etc.
  • You extract tweets, polls, etc., data from memcached and show them to them.

This essentially means Django signals . They will start automatically after saving or updating your object, which is a great time to update cache storages with the latest information.

Performing this method means that you will never have to run a background task that periodically checks your database and updates your cache - your cache will always be up to date with the latest data.

+6
source

All Articles