DISQUS

Eric Florenzano's Blog: Tagging cache keys for O(1) batch invalidation

  • Atamert Ölçgen · 10 months ago
    I wonder where `get_key()` comes from? (in `favorite_list_hash()`)

    Other than that great method. IMO strong cache invalidation improves caching a lot. Especially for more web-applicationy projects than CMSs.
  • Eric Florenzano · 10 months ago
    Atamert: Whoops! That is a function that I use in my code to do various other caching related things. I removed it for clarity. Thanks for the feedback :)
  • zgoda · 10 months ago
    You have race condition in your code - 2 trips to cache mean the cache state may be different in second trip and you may clear too much.

    Anyway, this does not seem problematic unless you use this technique to store session data in cache.
  • Eric Florenzano · 10 months ago
    Zgoda: I have the philosophy that nothing should ever depend on something existing in cache and that any data should be able to be evicted and your site should be fine. You're right though, it's possible for this to clear too much. I just don't think that's a problem.
  • Karan · 10 months ago
    Nice article, especially nicer to read with the new design :)
  • David Zhou · 10 months ago
    I've done something similar, but instead used a database field to store the cache hash. So basically rather than storing and fetching cache keyed to user id, it's keyed to a user_cache_key field that was "<user_id>_<counter>".

    Then, when I want to invalidate all cache associated with a user, I just increment the counter.

    In terms of database hits, it's about the same, since you just pass in the cache key rather than the user id when fetching cache.
  • Brett · 10 months ago
    A race that isn't just about temporary waste of memory,

    A -> gets hash 0x01
    B -> deletes hash 0x01 because favorites changed
    A -> gets favorites-37-None-None-0x01, which is invalid
  • cibyr · 10 months ago
    So A gets some slightly stale data... isn't this exactly what would happen is A's actions happened atomically before B's?
  • Brett · 10 months ago
    Yeah, I agree, after looking at it - it really isn't much of a problem.
  • Mike P · 10 months ago
    Is this what you were ramblin about at Kate O'Brien's? I vaguely remember hearing something about using a cache key to invalidate cache while I was too busy rewatching the same basketball highlights on the TV in the corner...

    Good tip, and yeah, the "race condition" is not really a race condition. It's more like a fact of life when using a cache. Sometimes, things just won't be that fresh. This approach goes a long way towards minimizing the number of times that would actually happen.
  • Mike Malone · 10 months ago
    Eric, nice post. I know of lots of large-scale sites that use this technique (or some slight variation on it) in practice, so it's definitely valid and scales well.

    Re: race conditions, what you're generally after with this sort of caching is read-your-write consistency (if I write data then immediately read it, the read should be consistent) and eventual consistency (eventually everybody should get a consistent view of the data). So the only type of race condition that matters, IMO, is the type where you risk _storing_ stale data in a location that makes it look like _new_ data. Your naive solution was actually susceptible to this (the delete could happen between the get and the set), but as far as I can tell your cache tag solution is not - the worst that could happen is that stale data is stored after invalidation under an old cache key.
  • Jauder Ho · 10 months ago
    You might want to see this article about keeping memcache consistent.

    http://terrychay.com/blog/article/keeping-memca...

    Useful concepts that you can consider.
  • Ian Lewis · 10 months ago
    Django's own cache framework does something similar when caching pages by caching page headers with a key generated from the request path and then using the headers to generate a second key to the actual cached page.

    That way you can invalidate a particular pages content by simply deleting the headers from the cache by request.path (and settings.CACHE_MIDDLEWARE_KEY_PREFIX). Something like

    views.decorators.cache.cache_header.%s.%s' % (settings.CACHE_MIDDLEWARE_KEY_PREFIX, iri_to_uri(request.path))

    Unrelated to what you're doing it also has the added benefit of invalidating the cached page if the headers change (such as Vary,Cookies etc.). It should probably act like an upstream cache and look at he Vary headers and act accordingly but it works well as is.
  • kellan · 5 months ago
    I realize this comment thread has already gone to hell and spam.

    But assuming anyone is still reading, you might want to consider a technique by which you invalidate and re-build out of band or you're subject to a thundering herd problem.
  • phiggy · 5 months ago
    There is a similar technique described in the memcached faq.
  • junla · 5 months ago
    Nice article,You're right though, it's possible for this to clear too much.