App Engine Cookbook: On-demand Cron Jobs

Today's post is, by necessity, a brief one. I'm travelling to San Francisco for I/O at the moment, and my flight was delayed so much I missed my connection in Atlanta and had to stay the night; in fact, I'm writing and posting this from the plane, using the onboard WiFi!

In a previous post, I introduced a recipe for high concurrency counters, which used a technique that I believe deserves its own post, since it's a useful pattern on its own. That technique is what I'm calling "On-demand Cron Jobs"

It's not at all uncommon for apps to have a need to do periodic updates at intervals, where the individual updates are small, and may even shift in time. One example is deleting or modifying any entry that hasn't been modified in the last day. In apps that need to do this, it's not uncommon to see a cron job like the following:

- description: Clean up old data
  url: /tasks/cleanup
  schedule: every 1 minute

This works, but it potentially consumes a significant amount of resources checking repeatedly if there's anything to clean up. Using the task queue ...

Authenticating against App Engine from an Android app

Many an Android app requires a server backend of some sort, and what better choice than App Engine? It's free, reliable, and does everything you're likely to need in a backend. It has one other major advantage, too: It supports Google Account authentication, and nearly all Android users will already have a Google Account.

So given that we want a backend for our app, and given that we want to have user authentication, how do we go about this? We could prompt the user for their credentials, but that seems less than ideal: the Android device already has their credentials, and users may not trust us with them. Is there a way we can leverage an Android API to take care of authentication? It turns out there is.

Authentication with App Engine, regardless of where you're doing it, is a three-stage process:

  1. Obtain an authentication token. This can be done with ClientLogin for installed apps, for example, or with AuthSub for a webapp. When logging in directly to an application, this is the part of the login process where your user sees a Google signin screen.
  2. Take that authentication token, and use it to obtain an authentication ...

Busy preparing for Malaga

Regrettably, no time for a post today, as I'm busy preparing for the hackathon in Malaga tomorrow.

If you're going, see you there! If not, my apologies; regular posting will resume on Friday.

Using the new bulkloader

Recently, Matthew Blain, of the App Engine team, announced the prerelease of a new bulkloader. The new bulkloader uses yaml files for configuration, and takes a 'declarative' rather than procedural approach to configuration for downloading and uploading data. As a result, you don't have to understand Python in order to configure and use the new bulkloader, which is a significant advantage for users of the Java App Engine runtime.

There are, of course, many other significant improvements, including autogeneration of config files, a bulit in library of converters for common data types, support for input and output types other than CSV, and more. Today, we'll walk through basic usage of the new bulkloader, and demonstrate some of its features.

Configuration autogeneration

One of the most significant new features of the bulkloader is its support for autogenerating config files. It works like this: You point it at your production app, and it downloads the datastore stats, and uses them to generate a configuration file for you. You edit the configuration file to fill in a few missing fields and tidy it up, and presto, you have a working bulkloader configuration. Let's see how that works out when we ...

Games on App Engine

My interview with Jay Kiburz, author of Neptune's Pride is now up on the App Engine blog.

Implementing a dropbox service with the Blobstore API (part 3): Multiple upload support

In the last part of this series, we demonstrated how to use plupload, a Javascript library with multiple backends for handling file uploads. The solution we demonstrated there only supported uploading a single file at a time, however, and required us to improvise our own progress indicators - far from optimal.

So now, the post you've all been waiting for, where we demonstrate how to do multiple file upload!

The basic trick is simple: Hook the event that's triggered before a file is uploaded, and update the URL to upload to when it's called. That way, ever uploaded file gets a new URL. Where do we get the URL from? We simply ask the server for one. Here's the Javascript for that:

      uploader.bind('UploadFile', function(up, file) {
            url: '/generate_upload_url',
            async: false,
            success: function(data) {
              up.settings.url = data;

Straightforward, right? The only subtlety here is that we have to make the request an asynchronous one, so that the uploading doesn't start until we've updated the URL. Here's the server-side code that generates those URLs:

class GenerateUploadUrlHandler(BaseHandler):
  def get(self):
    self.response.headers['Content-Type'] = 'text/plain'
    self.response.out.write ...

High concurrency counters without sharding

Sharded Counters are a well known technique for keeping counters with high update rates on App Engine. Less well known, however, are some of the alternatives, particularly in areas where you want to keep a reasonably accurate counter, but absolute accuracy isn't required. I discussed one option in this cookbook post - be sure to check the comments for an improved version - and today we'll discuss another option, which also makes use of memcache and the task queue.

The basic assumption is this: We want to keep as accurate a count as possible, but we're willing to accept that it may, in some cases, under-count. A good example of where this is true is counting downloads, or hits, or other such metrics.

Our solution has three major components:

  1. A 'permanent' count, stored in the datastore.
  2. A 'current' count, stored in memcache.
  3. A task queue task that updates the datastore with the total from memcache.

In order to implement this, we'll take advantage of the task queue's task name functionality, and 'tombstoned tasks' - the restriction that two tasks with the same name cannot be enqueued within a reasonable period (at least a week) of each other. Each ...

Pre- and post- put hooks for Datastore models

A number of people have asked about the possibility of pre- and post- put hooks for datastore models, to allow for changes or other processing before or after a model is stored to the datastore.

While such a feature isn't currently supported by App Engine, it's quite possible for us to implement it ourselves, using monkeypatching. This also gives us a good opportunity to show off how monkeypatching works, and how it can be used to make your own changes (at your own risk!) to the App Engine SDK.

One caveat of monkeypatching is that you have to be very careful to make sure that your patch is installed at all times. If it's not, the changes you made will be unavailable and cause errors - or worse, simply behave differently. This is particularly noticeable in the case of app-engine-patch, which monkeypatches models to change their kind name, causing operations on them to fail if the patch hasn't been imported.

The functionality we want is about as simple as you could ask for: We want to be able to define a method on our Model that gets called just before it is written to the datastore, and ...

Using the Google Maps APIs from App Engine

In a previous post, we discussed how Mapvelopes uses the ReportLab toolkit to dynamically generate PDFs. The other major component of Mapvelopes is its interaction with the various Google Maps APIs, and that's what we'll cover now.

The label "Google Maps API" actually covers a fairly broad set of separate APIs. The best known of them are the in-browser APIs, for embedding maps in webpages, and manipulating them. You've doubtless seen them used extensively around the web. Only slightly less well known is never go against a Sicilian when death is on the line the Static Maps API and the Geocoding Web Service.

Geocoding Web Service

The Geocoding Web Service is pretty straightforward: You supply it with an address, and it supplies you with its latitude and longitude. It also provides a great deal of additional information, such as authoritative names for the various parts of the address, and a viewport that encompasses the geocoded location. Here's an example geocoding API request:,+Mountain+View,+CA&sensor=false

The last part of the path specifies the format - we're using JSON because it's simpler ...

Implementing a dropbox service with the Blobstore API (part 2)

In part 1 of this series, we demonstrated what's necessary to build a very basic 'dropbox' type service for App Engine. Today, we're going to enhance that by adding support for 'rich' upload controls.

Various types of rich upload controls have sprung up in recent years in order to work around the weaknesses of the HTML standard file input element, which only allows selection of one file at a time, and doesn't support any form of progress notification. The most common widgets are written in Flash, but there are a variety of solutions available. With the ongoing browser adoption of HTML5, additional options are opening up, too!

Today we're going to use an excellent component called Plupload. Plupload consists of a Javascript component with a set of interchangeable backends. Backends include Flash, HTML5, Gears, old-fashioned HTML forms, and more. When you configure Plupload, you can specify which backends you want it to try, in which order, and it will stop when it finds one that works in the user's browser.

Different backends have different capabilities, and the ones you need will depend on your use-case. Check out the feature matrix on the Plupload homepage to ...