Consuming RSS feeds with PubSubHubbub

Frequently, it's necessary or useful to consume an Atom or RSS feed provided by another application. Doing so, though, is rarely as simple as it seems: To do so robustly, you have to worry about polling frequency, downtime, badly formed feeds, multiple formats, timeouts, determining which items are new and other such issues, all of which distract from your original, seemingly simple goal of retrieving new updates from an Atom feed. You're not alone, either: Everyone ends up dealing with the same set of issues, and solving them in more or less the same manner. Wouldn't it be nice if there was a way to let someone else take care of all this hassle?

As you've no doubt guessed, I'm about to tell you that there is. I'm speaking, of course, of PubSubHubbub. I discussed publishing to PubSubHubbub as part of the Blogging on App Engine series, but I haven't previously discussed what's required to act as a subscriber. Today, we'll cover the basics of PubSubHubbub subscriptions, and how you can use them to outsource all the usual issues consuming feeds.

At this point, you may be wondering how this is ...

Writing a twitter service on App Engine

Services that consume or produce Twitter updates are popular apps these days, and there are more than a few on App Engine, too. Twitter provide an extensive API, which provides most of the features you might want to access.

Broadly, Twitter's API is divided into two distinct parts: The streaming API, and everything else. The streaming API is their recommended way to consume large volumes of updates in real-time; unfortunately, for a couple of reasons, using it on App Engine is not practical at the moment. The rest of their API, however, is well suited to use via App Engine, and covers things such as retrieving users' timelines, mentions, retweets, etc, sending new status updates (and deleting them, and retweeting them), and getting user information.


Most of Twitter's API calls require authentication. Currently, Twitter support two different authentication methods: Basic, and OAuth. Basic authentication, as the name suggests uses HTTP Basic authentication, which requires prompting the user for their username and password. We won't be using this, since it's deprecated, and asking users for their credentials is a bad idea. The OAuth API makes it possible to call Twitter APIs on behalf of a user ...

Bulk updates with cursors

Last week, I blogged about cursors, a new feature in version 1.3.1 of the App Engine SDK. Today, I'm going to demonstrate a practical use for them: Bulk datastore updates.

In both the Remote API and deferred articles, I used a (perhaps poorly named) 'mapper' class as an example of ways to use these libraries. In neither case was the class intended to be anything other than a sample use case for the library, but nevertheless, people have used the examples in production. The introduction of cursors provides a prime opportunity to introduce a more robust, yet simpler, version of the bulk updater concept.

First, let's define a few requirements for our bulk updater:

  • Support for any query for which a cursor can be obtained
  • Handles failure of individual updates gracefully
  • Can fail the whole update process if enough errors are encountered
  • Handles timeout errors, service unavailability, etc, transparently
  • Can report completion to admins

As in the Remote API and Deferred articles, we'll implement the updater as an abstract class, which individual updater implementations should subclass. Here's the basic interface:

import logging
import time
from google.appengine.api import mail
from google.appengine.ext ...

Webapps on App Engine, part 6: Lazy loading

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

A major concern for many people developing for App Engine, particularly those building low-to-medium traffic sites, is instance load time. When App Engine serves the first request to a new instance of your app, it must import the request handler module you specified, which in turn imports all the other modules required to serve the request. In large apps, this can add up to quite a lot of additional overhead for loading requests, and substantially impact the experience for end users.

There are a number of things you can do to reduce loading times, including using lighter weight frameworks instead of all inclusive ones, and breaking seldom used components up into separate handlers - an approach taken by bloggart for the admin interface. One source of inefficiency stands out as a prime candidate for optimisation, though: unnecessary imports.

Many frameworks, including the built in webapp framework, require you to provide a list of handler classes that should be instantiated to serve requests, in a 'url map'. When a request comes in, the framework simply instantiates the relevant class and ...

New features in 1.3.1 prerelease: Cursors

Recently, the App Engine team announced that they'd be pre-releasing SDKs for testing and feedback, before they go live in production. With the first prerelease, 1.3.1, a number of new features are included in the SDK. Today we'll discuss cursors - how they work, and what they're useful for.

Cursors are a feature that many people have been waiting for with bated breath. As well as making pagination easier, they also provide a way around the "1000 result limit" that many people feel (in some cases correctly) makes it harder to achieve what they want to do on App Engine.

When it comes to investigating new features, there are two really useful tools: An interactive console - such as that on http://localhost:8080/_ah/admin/, or the remote_api console - and the source code. Many people forget that as an Open Source project, the App Engine SDK code is all available - and easily browseable on

Our first stop is google/appengine/ext/db/ Of interest here is the cursor() method, which starts on line 1600. As you can see, when called on a query that's already been ...

Webapps on App Engine, Part 5: Sessions

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

Sessions are another component that's regularly required by webapps, but isn't really a core part of a framework. In this post, we'll discuss the session mechanisms available for App Engine and how they work, and settle on a recommendation for our own lightweight framework.

The basic mechanism behind a session library is straightforward: A random session ID is generated for the user, which is embedded in an HTTP cookie and sent to the user. Meanwhile, a record is created on the server with the same ID, containing any data the webapp wants to store about this user. When the user makes a subsequent request, the session library decodes the session ID from the cookie header, and loads the corresponding session record from permanent storage.

There are three major advantages of handling sessions this way, rather than naively storing session data directly in the cookie:

  • We can store data that the client shouldn't be able to modify, such as the user's access flags.
  • We can store data the client shouldn't even be able ...

Webapps on App Engine, part 4: Templating

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

In the first three posts of this series, we covered all the components of a bare bones webapp framework: routing, request/response encoding, and request handlers. Most people expect a lot more out of their web framework, however. Individual frameworks take different approaches to this, from the minimalist webapp framework, which provides the bare minimum plus some integration with other tools, to the all-inclusive django, to the 'best of breed' Pylons, which focuses on including and integrating the best libraries for each task, rather than writing their own.

For our framework, we're going to take an approach somewhere between webapp's and Pylons': While keeping our framework minimal and modular, we'll look at the best options to use for other components - specifically, templating and session handling. In this post, we'll discuss templating.

To anyone new to webapps, templates may seem somewhat unnecessary. We can simply generate the output direct from our code, right? Many CGI scripting languages used this approach, and the results are often messy. Sometimes, which page to be generated isn't clear ...

ReferenceProperty prefetching in App Engine

This post is a brief interlude in the webapps on App Engine series. Fear not, it'll be back!

Frequently, we need to do a datastore query for a set of records, then do something with a property referenced by each of those records. For example, supposing we are writing a blogging system, and we want to display a list of posts, along with their authors. We might do something like this:

class Author(db.Model):
  name = db.StringProperty(required=True)

class Post(db.Model):
  title = db.TextProperty(required=True)
  body = db.TextProperty(required=True)
  author = db.ReferenceProperty(Author, required=True)

posts = Post.all().order("-timestamp").fetch(20)
for post in posts:
  print post.title

On the surface, this looks fine. If we look closer, however - perhaps by using Guido's excellent AppStats tool, we'll notice that each iteration of the loop, we're performing a get request for the referenced author entity. This happens the first time we dereference any ReferenceProperty, even if we've previously dereferenced a separate ReferenceProperty that points to the same entity!

Obviously, this is less than ideal. We're doing a lot of RPCs, and incurring a lot of ...

Webapps on App Engine part 3: Request handlers

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

Now that we've covered the background on request handling, it's time to tackle request handlers. Request handlers are the core and most obvious part of a web framework. They serve to simplify the writing of your app, and remove some of the boilerplate that you end up with if you write raw WSGI applications. Before we go any further, let's see what basic request handlers look like in a range of frameworks. Then we can discuss the pros and cons of each, and settle on one for ours.


def current_datetime(request):
    now =
    html = "It is now %s." % now
    return HttpResponse(html)

App Engine's webapp framework:

class MainPage(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello, webapp World!')


def display(request, uid):
    url = URL.query.get(uid)
    if not url:
        raise NotFound()
    return render_template('display.html', url=url)


def hello1():
    return "Hello World"


class HelloController(BaseController):

    def index(self):
        # Return a rendered template
        #return render('/hello.mako')
        # or ...

Webapps on App Engine part 2: Request & Response handling

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

This post is mostly background on request and response encoding/decoding. If you're already fairly familiar with how this works in CGI and in higher level frameworks, you may want to skip this and wait for the next posting, on request handlers.

If you've ever written a CGI script, you'll be well aware of how much of a pain interpreting and decoding CGI environment variables and headers can be - so much so that the first action of many is to find or write a library to handle it for you! And if you've ever coded in a CGI-inspired language such as PHP, you're probably familiar with how much of a pain managing the combination of response headers and output content can likewise be.

As a result of this, one of the most basic tools a framework offers is some form of abstraction for request and response data. Typically, this takes care of collecting and parsing HTTP headers, parsing the query string (if any), decoding POSTed form data, and so forth. Better frameworks also ...