Efficient model memcaching

Posted by Nick Johnson | Filed under coding, app-engine, cookbook, tech

This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:

from google.appengine.api import memcache
from google.appengine.ext import db

entities = memcache.get("somekey")
if not entities:
 entities = MyModel.all().fetch(10)
 memcache.set("somekey", entities)

This has several problems:

Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.

Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.

Two new functions were added to the db module in release 1.2.5: model_to_protobuf and model_from_protobuf. What these functions do is convert between the Model instances we all know and love, and the internal representation used for entities, Protocol Buffers. Protocol Buffers have a natural and efficient binary representation, so they're a lot more efficient to encode and decode than pickling. They also won't change incompatibly between App Engine versions, because they're what's used to store data in the datastore.

Here's an example of using memcache to cache a single value:

from google.appengine.api import memcache
from google.appengine.ext import db
from google.appengine.datastore import entity_pb

entity = memcache.get("somekey")
if entity:
  entity = db.model_from_protobuf(entity_pb.EntityProto(entity))
else:
 entities = MyModel.all().fetch(10)
 memcache.set("somekey", db.model_to_protobuf(entities).Encode())

Examining it in reverse order, you can see that when we set the value in memcache, we call db.model_to_protobuf, then we call Encode on the returned entity. This returns a compact binary representation of the entity that we can store in the datastore. When we fetch the entity from memcache, we first recreate the Entity (by passing the data to the entity_pb.EntityProto constructor), then we call db.model_from_protobuf on that to retrieve the model instance.

While efficient, this looks a little awkward. There's also no obvious way to store a list of entities in a single memcache entry, which is pretty essential for most uses. Fortunately, it's straightforward to write some utility functions to make this easier:

def serialize_entities(models):
 if models is None:
 return None
 elif isinstance(models, db.Model):
 # Just one instance
 return db.model_to_protobuf(models).Encode()
 else:
 # A list
 return [db.model_to_protobuf(x).Encode() for x in models]

def deserialize_entities(data):
 if data is None:
 return None
 elif isinstance(data, str):
 # Just one instance
 return db.model_from_protobuf(entity_pb.EntityProto(data))
 else:
 return [db.model_from_protobuf(entity_pb.EntityProto(x)) for x in data]

As you can see, these methods support being passed either individual Model instances, or a list of instances - in keeping with the rest of the datastore API. Sharp eyed readers will realise that our results will still be pickled, which seems a little hypocritical after going on about the evils of pickling. However, pickling a short list of strings is much more efficient, both in space and time, than pickling a list of entities! The pickling overhead here is an acceptable compromise - though a more complicated solution using the struct module could avoid the need to pickle entirely.

Finally, here's our initial example, reworked to use our new methods:

from google.appengine.api import memcache
from google.appengine.ext import db

entities = deserialize_entities(memcache.get("somekey"))
if not entities:
 entities = MyModel.all().fetch(10)
 memcache.set("somekey", serialize_entities(entities))

Congratulations! You're now efficiently memcaching your models!

Edit: James Levy has posted a nice recipe for a memoizing decorator that uses model_to_protobuf and model_to_protobuf here.

Keep an eye out for the next post, coming soon: Advanced bulk loading!

12 September, 2009

Previous Post Next Post

Nick's Blog

Efficient model memcaching

Comments

Blogroll