Announcing a robust datastore bulk update utility for App Engine
Posted by Nick Johnson | Filed under app-engine, python, coding, tech, bulkupdate, datastore
Note: This library is deprecated in favor of appengine-mapreduce, which is now bundled with the SDK.
I'm pleased to announce the release of bulkupdate, an unoriginally-named library for the App Engine Python runtime that facilitates doing bulk operations on datastore data. With bulkupdate, simple operations like bulk re-puts and bulk deletes are trivial, while more complex operations like schema transitions or even emailing all your users become much simpler.
The basic operation of bulkupdate is very similar to the 'map' phase of the well known 'mapreduce' pattern. To use it, you create a subclass of the 'Bulkupdater' class, and define two methods: get_query(), which returns the query to execute, and handle_entity(), which is called once for each entity returned by the query. For example, suppose you want to write a daily task that sends an XMPP message to everyone with new activity on their accounts - the updater class would look something like this:
class ActivityNotifier(bulkupdate.BulkUpdater): def __init__(self, date_threshold): self.date_threshold = date_threshold def get_query(self): return UserAccount.all().filter('last_update >', self.date_threshold) def handle_entity(self, user): if user.unread_messages > 0: xmpp.send_message(user.jid, "You have %s unread messages!" % user.unread_messages)
Running the job is even simpler ...