Custom Datastore Properties 1: DerivedProperty
Posted by Nick Johnson | Filed under coding, app-engine, cookbook, tech
This is the eighth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
A common requirement when using the Datastore is storing some form of calculated property - for example, the lower-cased version of a text string, so it can be filtered on, or the length of a list, or the sum of some elements. One can do this manually, but it's easy to forget to update the computed property in some places. Other solutions include overriding the put() method, but this doesn't get updated if you store your entity using db.put(). Given the post this is appearing in, I'm sure you can figure out what the solution is going to be: a custom property class!
What we want here is a DerivedProperty. In order to be as flexible as possible, we should allow the user to supply a function to be called to generate the value to store in the property; by passing the Model object to the function, we can permit users to write derived properties that depend on multiple properties of the Model object. The basic code for a DerivedProperty class is succinct:
class DerivedProperty(db.Property):
def __init__(self, derive_func, *args, **kwargs):
super(DerivedProperty, self).__init__(*args, **kwargs)
self.derive_func = derive_func
def __get__(self, model_instance, model_class):
if model_instance is None:
return self
return self.derive_func(model_instance)
def __set__(self, model_instance, value):
raise db.DerivedPropertyError("Cannot assign to a DerivedProperty")
Because the DerivedProperty exists only as a calculated value - we don't cache it anywhere - we need to override the __get__ method, to ensure that users can retrieve the value of the DerivedProperty and get the expected value back. We don't have to override get_value_for_datastore as described in the articles linked above, though - the default implementation of get_value_for_datastore is to call __get__ and return its value.
We also override __set__ to raise an exception if they try and modify it directly. Note that we're raising a 'DerivedPropertyError' from the db module. db treats this exception specially to avoid errors when constructing a new object with default values. Almost makes you wonder if there were plans for such a property in the SDK, doesn't it?
Now that we have a DerivedProperty, we can use it in the expected manner:
class MyModel(db.Model):
name = db.StringProperty()
name_lower = DerivedProperty(lambda self: self.name.lower())
We can also use it as a decorator, which is convenient for longer methods:
class File(db.Model):
name = db.StringProperty()
data = db.BlobProperty()
@DerivedProperty
def hash(self):
return hashlib.sha1(self.data).hexdigest()
There's one shortcoming here, though: If we use DerivedProperty as a decorator, we can't pass any additional arguments to it, such as 'required', or 'name'. We could work around this by defining our method separately, and passing the name of it in the DerivedProperty constructor, but that's rather kludgy. We can have the best of both worlds, though. First, we rename our DerivedProperty class to '_DerivedProperty'. Then, we define a method called DerivedProperty that looks like the following:
def DerivedProperty(derive_func=None, *args, **kwargs):
if func:
# Regular invocation
return _DerivedProperty(func, *args, **kwargs)
else:
# Decorator function
def decorate(decorated_func):
return _DerivedProperty(decorated_func, *args, **kwargs)
return decorate
This function performs a bit of trickery: If a function is passed in the first argument (as is the case in either of the examples we've already shown), it acts just like the original DerivedProperty constructor, and returns an instance of _DerivedProperty. If it's called without the function argument, however, it returns a decorator function, which when used to decorate the original function, returns a _DerivedProperty that wraps it. Here's our previous example, refactored to use it:
class File(db.Model):
name = db.StringProperty()
data = db.BlobProperty()
@DerivedProperty(name='sha1')
def hash(self):
return hashlib.sha1(self.data).hexdigest()
What happens here can be a bit mind-bending at first. Python parses the method definition for 'hash'. It also executes the DerivedProperty(name='sha1') function call, which itself returns a function ('decorate' in the above code). Finally, it calls the returned function (decorate) on the original function (hash), which results in a _DerivedProperty object, which it assigns to the name 'hash'. Clear as mud?
With a DerivedProperty in hand, we can easily define subclasses for automating common operations. For example, a LowerCaseProperty:
class LowerCaseProperty(_DerivedProperty):
def __init__(self, prop, *args, **kwargs):
super(LowerCaseProperty, self).__init__(
lambda self: prop.__get__(self, type(self)).lower(),
*args, **kwargs)
and a LengthProperty:
class LengthProperty(_DerivedProperty):
def __init__(self, prop, *args, **kwargs):
super(LengthProperty, self).__init__(
lambda self: len(prop.__get__(self, type(self))),
*args, **kwargs)
class TransformProperty(_DerivedProperty):
def __init__(self, prop, transform_func, *args, **kwargs):
super(TransformProperty, self).__init__(
lambda self: transform_func(prop.__get__(self, type(self))),
*args, **kwargs)
TransformProperty can be used in place of LengthProperty like this:
class File(db.Model):
name = db.StringProperty()
data = db.BlobProperty()
length = TransformProperty(data, len)
I've started implementing useful Property subclasses like this in a library I'm tentatively calling AETycoon (because it has all the Property you could ever need). It's intended to be more robust, and more reusable, than the proliferation of cookbook recipes we currently have. Contributions are, of course, welcome.
Have an idea for another use of DerivedProperty? Leave a comment!