Advanced Bulk Loading, part 2: Customization

This is the third in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

Customizing Created Entities

Sometimes, it can be useful to modify an entity after it's created by the bulkloader, but before it's uploaded to App Engine. In our file upload example above, for instance, we might want to set the filesize field to the length of the uploaded file, but we don't want to add a new field to the data just to indicate that, since it's extra work, and could rapidly become inaccurate as files change.

Fortunately, the bulkloader provides an easy mechanism for this: The handle_entity method. By overriding this method, we can perform whatever postprocessing we wish on an entity before it's uploaded to the datastore. Here's an example that sets the filesize field:

class ImageLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'DatastoreImage', [('filename', str), ('data', file_loader) ]) def handle_entity(self, entity): entity.filesize = len(entity.data) return entity

As an added bonus, the handle_entity method is not restricted to returning a single entity: It may return a list of them. You can use this to generate additional 'computed' entities based on the one being uploaded - this is useful if you're implementing the 'relation index' pattern, for example, or if the data you're uploading is denormalized.

Providing Key Names

By default, uploaded entities get an ID rather than a key name. This is frequently fine, but sometimes you want to bulk load data that requires key names instead. Here, again, bulkloader comes to the rescue by providing the generate_key method. This method takes two parameters: The index of the entry being uploaded, and a list of its values. Suppose we wanted to name our DatastoreImage entities after filename for easy lookup; we can accomplish this like so:

class ImageLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'DatastoreImage', [('filename', str), ('data', file_loader) ]) def generate_key(self, i, values): return values[0] def handle_entity(self, entity): entity.filesize = len(entity.data) return entity

If you're going to generate your own key names, take care: If you generate an entity with the same key as an existing one, your new entity will overwrite the one already in the datastore! This may be what you want - but then again, it may not be.

In the next post, we'll be taking a brief interlude from Bulk Loading to discuss distributed transactions on App Engine, and the canonical 'bank account' example.

When we resume our series on Advanced Bulk Loading, we'll discuss how to load directly from an SQL database or nearly any other data source.

Comments

blog comments powered by Disqus