Efficient model memcaching
Posted by Nick Johnson | Filed under coding, app-engine, cookbook, tech
This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:
from google.appengine.api import memcache
from google.appengine.ext import db
entities = memcache.get("somekey")
if not entities:
entities = MyModel.all().fetch(10)
memcache.set("somekey", entities)
This has several problems:
- Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
- Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
- The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.
Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.
Two new functions were added to the db module in release 1.2.5: model_to_protobuf and model_from_protobuf. What these functions do is convert between the Model instances we all know and love, and the internal representation used for entities, Protocol Buffers. Protocol Buffers have a natural and efficient binary representation, so they're a lot more efficient to encode and decode than pickling. They also won't change incompatibly between App Engine versions, because they're what's used to store data in the datastore.
Here's an example of using memcache to cache a single value:
from google.appengine.api import memcache
from google.appengine.ext import db
from google.appengine.datastore import entity_pb
entity = memcache.get("somekey")
if entity:
entity = db.model_from_protobuf(entity_pb.EntityProto(entity))
else:
entities = MyModel.all().fetch(10)
memcache.set("somekey", db.model_to_protobuf(entities).Encode())
Examining it in reverse order, you can see that when we set the value in memcache, we call db.model_to_protobuf, then we call Encode on the returned entity. This returns a compact binary representation of the entity that we can store in the datastore. When we fetch the entity from memcache, we first recreate the Entity (by passing the data to the entity_pb.EntityProto constructor), then we call db.model_from_protobuf on that to retrieve the model instance.
While efficient, this looks a little awkward. There's also no obvious way to store a list of entities in a single memcache entry, which is pretty essential for most uses. Fortunately, it's straightforward to write some utility functions to make this easier:
def serialize_entities(models):
if models is None:
return None
elif isinstance(models, db.Model):
# Just one instance
return db.model_to_protobuf(models).Encode()
else:
# A list
return [db.model_to_protobuf(x).Encode() for x in models]
def deserialize_entities(data):
if data is None:
return None
elif isinstance(data, str):
# Just one instance
return db.model_from_protobuf(entity_pb.EntityProto(data))
else:
return [db.model_from_protobuf(entity_pb.EntityProto(x)) for x in data]
As you can see, these methods support being passed either individual Model instances, or a list of instances - in keeping with the rest of the datastore API. Sharp eyed readers will realise that our results will still be pickled, which seems a little hypocritical after going on about the evils of pickling. However, pickling a short list of strings is much more efficient, both in space and time, than pickling a list of entities! The pickling overhead here is an acceptable compromise - though a more complicated solution using the struct module could avoid the need to pickle entirely.
Finally, here's our initial example, reworked to use our new methods:
from google.appengine.api import memcache
from google.appengine.ext import db
entities = deserialize_entities(memcache.get("somekey"))
if not entities:
entities = MyModel.all().fetch(10)
memcache.set("somekey", serialize_entities(entities))
Congratulations! You're now efficiently memcaching your models!
Edit: James Levy has posted a nice recipe for a memoizing decorator that uses model_to_protobuf and model_to_protobuf here.
Keep an eye out for the next post, coming soon: Advanced bulk loading!
Previous Post Next Post