Handling downtime: The capabilities API and testing

After the unfortunate outage the other day, how to handle downtime with your App Engine app is a bit of a hot topic. So what better time to address proper error handling for situations where App Engine isn't performing at 100%?

There's three major topics to cover here: Handling timeouts from API calls, using the Capabilities API, and testing your app's support for handling failures. We'll go over them in order.

Handling timeouts

At the 'stub' level, timeouts and other exceptions are communicated by the stub throwing an google.appengine.runtime.apiproxy_errors.ApplicationError. ApplicationError instances have an 'application_error' field, which contains an ID, drawn from google.appengine.runtime.apiproxy_errors, which indicates the cause of the error. As you can see, DEADLINE_EXCEEDED is 4. Other errors of interest are OVER_QUOTA, which will occur if your app runs out of quota for a given API call or capability, and CAPABILITY_DISABLED, which is thrown if the API capability has been explicitly disabled (more on this later).

Each of the various APIs catches ApplicationErrors thrown by their stub, and wraps them in a higher level exception. The datastore, for example, has a function, _ToDatastoreError that maps different error codes to ...