Python file cache

Question

Python file cache

I create some objects from files (validators from xsd file templates to merge other xsd files, as it happens), and I would like to recreate objects when the file on disk changes.

I could create something like:

def getobj(fname, cache = {}):
    try:
        obj, lastloaded = cache[fname]
        if lastloaded < last_time_written(fname):
           # same stuff as in except clause
    except KeyError:
        obj = create_from_file(fname)
        cache[fname] = (obj, currenttime)

    return obj

However, I would prefer to use other verified code if it exists. Is there an existing library that does something like this?

Update : I am using python 2.7.1.

+5

python file caching

Marcin Mar 24 '12 at 19:11

source share

3 answers

,

+1

Xavier Combelle 24 . '12 19:18

Raymond Hettinger · Answer 1 · 2012-03-24T19:14:18+0000

Your code (including cache logic) looks fine.

Consider moving a cache variable outside a function definition. This will add other features to clear or check the cache.

, - , filecmp: http://hg.python.org/cpython/file/2.7/Lib/filecmp.py. , , . :

def _sig(st):
    return (stat.S_IFMT(st.st_mode),
            st.st_size,
            st.st_mtime)

katrielalex · Answer 2 · 2012-03-24T19:40:27+0000

Three thoughts.

Use try... except... elsefor more accurate control flow.
The file modification time is known to be unstable - in particular, they do not necessarily correspond to the last moment when the file was modified!

Python 3 comprises a decoder caching: functools.lru_cache. Here is the source.

def lru_cache(maxsize=100):
    """Least-recently-used cache decorator.

    If *maxsize* is set to None, the LRU features are disabled and the cache
    can grow without bound.

    Arguments to the cached function must be hashable.

    View the cache statistics named tuple (hits, misses, maxsize, currsize) with
    f.cache_info().  Clear the cache and statistics with f.cache_clear().
    Access the underlying function with f.__wrapped__.

    See:  http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used

    """
    # Users should only access the lru_cache through its public API:
    #       cache_info, cache_clear, and f.__wrapped__
    # The internals of the lru_cache are encapsulated for thread safety and
    # to allow the implementation to change (including a possible C version).

    def decorating_function(user_function,
                tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):

        hits = misses = 0
        kwd_mark = (object(),)          # separates positional and keyword args
        lock = Lock()                   # needed because ordereddicts aren't threadsafe

        if maxsize is None:
            cache = dict()              # simple cache without ordering or size limit

            @wraps(user_function)
            def wrapper(*args, **kwds):
                nonlocal hits, misses
                key = args
                if kwds:
                    key += kwd_mark + tuple(sorted(kwds.items()))
                try:
                    result = cache[key]
                    hits += 1
                except KeyError:
                    result = user_function(*args, **kwds)
                    cache[key] = result
                    misses += 1
                return result
        else:
            cache = OrderedDict()       # ordered least recent to most recent
            cache_popitem = cache.popitem
            cache_renew = cache.move_to_end

            @wraps(user_function)
            def wrapper(*args, **kwds):
                nonlocal hits, misses
                key = args
                if kwds:
                    key += kwd_mark + tuple(sorted(kwds.items()))
                try:
                    with lock:
                        result = cache[key]
                        cache_renew(key)        # record recent use of this key
                        hits += 1
                except KeyError:
                    result = user_function(*args, **kwds)
                    with lock:
                        cache[key] = result     # record recent use of this key
                        misses += 1
                        if len(cache) > maxsize:
                            cache_popitem(0)    # purge least recently used cache entry
                return result

        def cache_info():
            """Report cache statistics"""
            with lock:
                return _CacheInfo(hits, misses, maxsize, len(cache))

        def cache_clear():
            """Clear the cache and cache statistics"""
            nonlocal hits, misses
            with lock:
                cache.clear()
                hits = misses = 0

        wrapper.cache_info = cache_info
        wrapper.cache_clear = cache_clear
        return wrapper

    return decorating_function

Python file cache

More articles: