Why do I need cached_property?

A big addition to Python classes was the inclusion of the property decorator for class methods.1 What's great about the property decorator is that it allows a method to be called as an instance variable, essentially allowing a new calculated variable to be used that has some underlying calculation being performed under the hood. Even though this is really good for writing clean, concise code, if there's a long-running function behind the variable, such as a database query, then it can take some time to return what appears to be a small variable. cached_property ensures that once the function is run once, the results are cached and can be resused quickly later.

Note: cached_property is new in functools for 3.8! You can't use this in previous versions unless you install the cached-property package.

Examples

The following example is taken from the functools documentation for cached_propery.

Using property

import statistics

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @property
    def stdev(self):
        return statistics.stdev(self._data)

    @property
    def variance(self):
        return statistics.variance(self._data)
>>> data = DataSet(range(20))
>>> data.stdev
5.916079783099616
>>> data.variance
35.0

Using cached_property

from functools import cached_property
import statistics

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)
>>> data = DataSet(range(20))
>>> data.stdev
5.916079783099616
>>> data.variance
35.0

Ok, they return the same information. What's the big deal? The above example is trivial, but if there was a long running process to caculate variance or standard deviation you wouldn't want to wait a few seconds just to return data.stdev or data.variance.

Pitfalls of cached_property

Defining slots

Since cached_property "requires that the __dict__ attribute on each instance be a mutable mapping"2 (i.e. there's a __dict__ attribute for the instance) cached_property won't work in some cases.

For example, if one defines __slots__ in an object then there's no __dict__ attribute and you'll raise a ValueError.

class DataSet:
    __slots__ = ['stdev']

    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'stdev' in __slots__ conflicts with class variable

As you can see, this code won't even be executed due to the ValueError. This exception isn't special for cached_property though; the same error is raised if one merely uses a property decorator as well.

Changing the Data

If I go back to using just the property decorator like so:

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @property
    def stdev(self):
        return statistics.stdev(self._data)

    @property
    def variance(self):
        return statistics.variance(self._data)
>>> data = DataSet(range(20))
>>> data.stdev
5.916079783099616
>>> data.variance
35.0

If I decide to change the data, the other variables update accordingly.

>>> data._data = range(30)
>>> data.stdev
8.803408430829505
>>> data.variance
77.5

Comparing this to the cached_property version:

class DataSet:

    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers
    @cached_property

    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)
>>> data = DataSet(range(20))
>>> data.stdev
5.916079783099616
>>> data.variance
35.0
>>> data._data = range(30)
>>> data.stdev
5.916079783099616
>>> data.variance
35.0

data.stddev and data.variance didn't update because they're cached! What do we do now if the underlying data changes? The solution is to invalidate the cache.

As you can see, from data__dict__, the cached data is in a dictionary. However, it only gets there once the data.stdev property method is run. Now, it looks like this:

>>> data.__dict__
{'_data': range(0, 30), 'stdev': 5.916079783099616, 'variance': 35.0}

But before running data.stdev and data.variance it looks like this:

>>> data = DataSet(range(20))
>>> data.__dict__
{'_data': range(0, 20)}

Therefore to invalidate (i.e. "fix") the cache, you merely need to delete the values from the dictionary.

>>> data.stdev
5.916079783099616
>>> data.variance
35.0
>>> data.__dict__
{'_data': range(0, 20), 'stdev': 5.916079783099616, 'variance': 35.0}
>>> data._data = range(30)
>>> data.__dict__
{'_data': range(0, 30), 'stdev': 5.916079783099616, 'variance': 35.0}
>>> del data.__dict__['stdev']
>>> data.__dict__
{'_data': range(0, 30), 'variance': 35.0}
>>> data.stdev
8.803408430829505
>>> del data.__dict__['variance']
>>> data.__dict__
{'_data': range(0, 30), 'stdev': 8.803408430829505}
>>> data.variance
77.5
>>> data.__dict__
{'_data': range(0, 30), 'stdev': 8.803408430829505, 'variance': 77.5}

After deleting the values from data.__dict__ we were able to regenerate the correct values.

Conclusion

property is a very useful decorator for Python objects. cached_property is useful as well, but one should understand the tradeoffs when using this decorator. It might save some time on the backend, but could introduce some bugs due to caching.


  1. Raymond Hettinger in this Python talk

  2. https://docs.python.org/3/library/functools.html#functools.cached_property 

Comments

comments powered by Disqus

Published

Category

Python

Tags

Contact