Announcing a robust datastore bulk update utility for App Engine

Note: This library is deprecated in favor of appengine-mapreduce, which is now bundled with the SDK.

I'm pleased to announce the release of bulkupdate, an unoriginally-named library for the App Engine Python runtime that facilitates doing bulk operations on datastore data. With bulkupdate, simple operations like bulk re-puts and bulk deletes are trivial, while more complex operations like schema transitions or even emailing all your users become much simpler.

The basic operation of bulkupdate is very similar to the 'map' phase of the well known 'mapreduce' pattern. To use it, you create a subclass of the 'Bulkupdater' class, and define two methods: get_query(), which returns the query to execute, and handle_entity(), which is called once for each entity returned by the query. For example, suppose you want to write a daily task that sends an XMPP message to everyone with new activity on their accounts - the updater class would look something like this:

class ActivityNotifier(bulkupdate.BulkUpdater):
  def __init__(self, date_threshold):
    self.date_threshold = date_threshold

  def get_query(self):
    return UserAccount.all().filter('last_update >', self.date_threshold)

  def handle_entity(self, user):
    if user.unread_messages > 0:
      xmpp.send_message(user.jid, "You have %s unread messages!" % user.unread_messages)

Running the job is even simpler ...

Taking advantage of the new Apps Marketplace

The recently unveiled Apps Marketplace has been getting a lot of attention lately, and a lot of people are wanting to know how they can integrate their App Engine app with it, making use of its integrated single-signon support. Today we'll go over what's required to get this working.

Apps Marketplace uses OpenID for SSO. Fortunately, we can use the openid library, which provides a Users-API-Lookalike interface, to support this in App Engine. There are two additional requirements for getting SSO to work in an Apps Marketplace app:

Handling the first of these is easy: The aeoid library sets the realm of an OpenID request, by default, to the domain that the request was made over, so all we need to do is use that same domain name as the realm in our app's manifest file.

The second is a little trickier. The 'janrain' python-openid library which aeoid and other Python-based solutions are based on does not support host-meta as a discovery mechanism for OpenID URLs. Let's analyze what this discovery ...

Please stand by

Due to unforseen technical difficulties, today's blog post has been delayed. Look for it next week, where I'll describe what you can do to get started writing an app for the new Apps Marketplace right now.

In other news, I'm spending most of next week travelling, so I won't be able to keep up my usual thrice-weekly updates. Regular blogging will resume the following week. Sorry!

Interactive tables for fun and, er, fun.

Recently, I've been pondering, with some workmates, the practicality of putting together our own interactive table, similar to the Microsoft Surface or the reactable.

There are a number of variations on how to build one, but the one we're planning on trying seems to be the simplest: Build a custom table with a frosted glass or perspex top, and place a projector in the base, projecting onto the bottom of the frosted surface. Additionally, have a camera under the table, pointing at the surface, to detect touches and objects.

There are a number of variations on this theme. trackmate is a system of 2d barcodes and open source software that allows you to tag and track objects. Their example configurations involve a frosted plexiglass surface, with even illumination and a camera placed underneath. None of them directly support surfaces with images projected onto them, though.

This instructable demonstrates the construction of a multitouch table that supports both touch detection and a projector, through a technique called frustrated total internal reflection. It relies on a strip of infra-red LEDs along the edge of the panel, and touching the panel disrupts the internal reflection, allowing an infra-red camera under the ...

Using the ereporter module for easy error reporting in App Engine

One little known package in the google.appengine.ext package is ereporter. This package exists to make it easier to get summaries of errors generated by your Python App Engine app, and today we'll show you how.

Far too often for new webapps, error reports for live webapps are a catch-as-catch-can type practice, with reports coming in from dedicated users, and whenever you think to check the logs page of your app. A lot of bugs can slip through this way, however, with exceptions going unnoticed to everyone but the users who experience them, then walk away in disgust, never to return again. With ereporter, however, we'll demonstrate how to set up a simple handler that takes care of capturing all the exceptions that occur in your app, and emailing a daily report to you, summarizing what went wrong.

Installing ereporter consists of 3 stages: Modifying your handler script, modifying your app.yaml, and adding a cron job. Let's start by modifying your handler script(s). Add the following to the top of all your handler scripts (that is, scripts that are mentioned in app.yaml):

import logging
from google.appengine.ext import ereporter

ereporter.register_logger()

The ...

Announcing the SQLite datastore stub for the Python App Engine SDK

For the past couple of weeks, I've been working on one of those projects that seems to suck up every available moment (and some that technically aren't). Now, however, it's largely done, and as an extra bonus, I've been given permission to release it as an early preview for those that are interested.

The code in question is a new implementation of the local datastore for the Python App Engine SDK. While some of you are probably delighted at the news, I expect most of you are puzzled. Why do we need a new local datastore implementation? Let me explain.

The purpose of the local stubs in the App Engine SDK is to exactly replicate the behaviour of the production environment, and in general they do that very well. A specific non-goal is replicating the performance characteristics of the production environment, or being as scalable as the production environment - the stubs are designed for testing, not production use.

The Python SDK's datastore implementation operates by storing the entire contents of your development datastore in memory. It writes changes to disk so that it can reload your datastore when the dev_appserver is restarted, but the in-memory ...

Handling downtime: The capabilities API and testing

After the unfortunate outage the other day, how to handle downtime with your App Engine app is a bit of a hot topic. So what better time to address proper error handling for situations where App Engine isn't performing at 100%?

There's three major topics to cover here: Handling timeouts from API calls, using the Capabilities API, and testing your app's support for handling failures. We'll go over them in order.

Handling timeouts

At the 'stub' level, timeouts and other exceptions are communicated by the stub throwing an google.appengine.runtime.apiproxy_errors.ApplicationError. ApplicationError instances have an 'application_error' field, which contains an ID, drawn from google.appengine.runtime.apiproxy_errors, which indicates the cause of the error. As you can see, DEADLINE_EXCEEDED is 4. Other errors of interest are OVER_QUOTA, which will occur if your app runs out of quota for a given API call or capability, and CAPABILITY_DISABLED, which is thrown if the API capability has been explicitly disabled (more on this later).

Each of the various APIs catches ApplicationErrors thrown by their stub, and wraps them in a higher level exception. The datastore, for example, has a function, _ToDatastoreError that maps different error codes to ...

Consuming RSS feeds with PubSubHubbub

Frequently, it's necessary or useful to consume an Atom or RSS feed provided by another application. Doing so, though, is rarely as simple as it seems: To do so robustly, you have to worry about polling frequency, downtime, badly formed feeds, multiple formats, timeouts, determining which items are new and other such issues, all of which distract from your original, seemingly simple goal of retrieving new updates from an Atom feed. You're not alone, either: Everyone ends up dealing with the same set of issues, and solving them in more or less the same manner. Wouldn't it be nice if there was a way to let someone else take care of all this hassle?

As you've no doubt guessed, I'm about to tell you that there is. I'm speaking, of course, of PubSubHubbub. I discussed publishing to PubSubHubbub as part of the Blogging on App Engine series, but I haven't previously discussed what's required to act as a subscriber. Today, we'll cover the basics of PubSubHubbub subscriptions, and how you can use them to outsource all the usual issues consuming feeds.

At this point, you may be wondering how this is ...

Implementing a non-relational database in Go

In a previous Damn Cool Algorithms post, I discussed log structured storage, and how it applies to databases. For a long time, I've wanted to implement a database based on log structured storage, and a few other nice mechanics from other database systems:

  • Tables are key:value mappings, with duplicate keys allowed (Bigtable, BDB)
  • Map-based views, also known as materialized views, for indexing (couchdb).
  • Reducer support for views (couchdb).

Since my previous posts about go have been generally well received, and because I want to explore the language a bit more, I'll be implementing all this in Go. The approach I'd like to take is one of gradually building up abstractions. We'll tackle each of the components in its own post:

  1. An interface for writing records to an append-only file or set of files.
  2. A B-Tree implementation, built on the record interface.
  3. Map-based / materialized views, based on the B-Tree implementation.
  4. Reducers for views.

Unlike previous series, this one is likely to be fairly fragmented. There's a fair chunk of functionality to be implemented here, so I won't be able to get it out at my usual three posts a week schedule. In the meantime ...

Writing a twitter service on App Engine

Services that consume or produce Twitter updates are popular apps these days, and there are more than a few on App Engine, too. Twitter provide an extensive API, which provides most of the features you might want to access.

Broadly, Twitter's API is divided into two distinct parts: The streaming API, and everything else. The streaming API is their recommended way to consume large volumes of updates in real-time; unfortunately, for a couple of reasons, using it on App Engine is not practical at the moment. The rest of their API, however, is well suited to use via App Engine, and covers things such as retrieving users' timelines, mentions, retweets, etc, sending new status updates (and deleting them, and retweeting them), and getting user information.

Authentication

Most of Twitter's API calls require authentication. Currently, Twitter support two different authentication methods: Basic, and OAuth. Basic authentication, as the name suggests uses HTTP Basic authentication, which requires prompting the user for their username and password. We won't be using this, since it's deprecated, and asking users for their credentials is a bad idea. The OAuth API makes it possible to call Twitter APIs on behalf of a user ...