Bulk updates with cursors

Last week, I blogged about cursors, a new feature in version 1.3.1 of the App Engine SDK. Today, I'm going to demonstrate a practical use for them: Bulk datastore updates.

In both the Remote API and deferred articles, I used a (perhaps poorly named) 'mapper' class as an example of ways to use these libraries. In neither case was the class intended to be anything other than a sample use case for the library, but nevertheless, people have used the examples in production. The introduction of cursors provides a prime opportunity to introduce a more robust, yet simpler, version of the bulk updater concept.

First, let's define a few requirements for our bulk updater:

  • Support for any query for which a cursor can be obtained
  • Handles failure of individual updates gracefully
  • Can fail the whole update process if enough errors are encountered
  • Handles timeout errors, service unavailability, etc, transparently
  • Can report completion to admins

As in the Remote API and Deferred articles, we'll implement the updater as an abstract class, which individual updater implementations should subclass. Here's the basic interface:

import logging
import time
from google.appengine.api import mail
from google.appengine.ext ...

No post today

Sorry, but I'm totally mentally exhausted after a long week - including a talk at UCD on Wednesday - and I just don't have the energy to write up today's post. Look out for it on Monday, instead.

Webapps on App Engine, part 6: Lazy loading

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

A major concern for many people developing for App Engine, particularly those building low-to-medium traffic sites, is instance load time. When App Engine serves the first request to a new instance of your app, it must import the request handler module you specified, which in turn imports all the other modules required to serve the request. In large apps, this can add up to quite a lot of additional overhead for loading requests, and substantially impact the experience for end users.

There are a number of things you can do to reduce loading times, including using lighter weight frameworks instead of all inclusive ones, and breaking seldom used components up into separate handlers - an approach taken by bloggart for the admin interface. One source of inefficiency stands out as a prime candidate for optimisation, though: unnecessary imports.

Many frameworks, including the built in webapp framework, require you to provide a list of handler classes that should be instantiated to serve requests, in a 'url map'. When a request comes in, the framework simply instantiates the relevant class and ...

New features in 1.3.1 prerelease: Cursors

Recently, the App Engine team announced that they'd be pre-releasing SDKs for testing and feedback, before they go live in production. With the first prerelease, 1.3.1, a number of new features are included in the SDK. Today we'll discuss cursors - how they work, and what they're useful for.

Cursors are a feature that many people have been waiting for with bated breath. As well as making pagination easier, they also provide a way around the "1000 result limit" that many people feel (in some cases correctly) makes it harder to achieve what they want to do on App Engine.

When it comes to investigating new features, there are two really useful tools: An interactive console - such as that on http://localhost:8080/_ah/admin/, http://shell.appspot.com/ or the remote_api console - and the source code. Many people forget that as an Open Source project, the App Engine SDK code is all available - and easily browseable on code.google.com.

Our first stop is google/appengine/ext/db/__init__.py. Of interest here is the cursor() method, which starts on line 1600. As you can see, when called on a query that's already been ...

Webapps on App Engine, Part 5: Sessions

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

Sessions are another component that's regularly required by webapps, but isn't really a core part of a framework. In this post, we'll discuss the session mechanisms available for App Engine and how they work, and settle on a recommendation for our own lightweight framework.

The basic mechanism behind a session library is straightforward: A random session ID is generated for the user, which is embedded in an HTTP cookie and sent to the user. Meanwhile, a record is created on the server with the same ID, containing any data the webapp wants to store about this user. When the user makes a subsequent request, the session library decodes the session ID from the cookie header, and loads the corresponding session record from permanent storage.

There are three major advantages of handling sessions this way, rather than naively storing session data directly in the cookie:

  • We can store data that the client shouldn't be able to modify, such as the user's access flags.
  • We can store data the client shouldn't even be able ...

Webapps on App Engine, part 4: Templating

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

In the first three posts of this series, we covered all the components of a bare bones webapp framework: routing, request/response encoding, and request handlers. Most people expect a lot more out of their web framework, however. Individual frameworks take different approaches to this, from the minimalist webapp framework, which provides the bare minimum plus some integration with other tools, to the all-inclusive django, to the 'best of breed' Pylons, which focuses on including and integrating the best libraries for each task, rather than writing their own.

For our framework, we're going to take an approach somewhere between webapp's and Pylons': While keeping our framework minimal and modular, we'll look at the best options to use for other components - specifically, templating and session handling. In this post, we'll discuss templating.

To anyone new to webapps, templates may seem somewhat unnecessary. We can simply generate the output direct from our code, right? Many CGI scripting languages used this approach, and the results are often messy. Sometimes, which page to be generated isn't clear ...

Snow Sprint wrap-up, and introducing Tweet Engine

It's Friday evening, which means the Snow Sprint is wrapping up, and everyone's presenting their App Engine apps. There's some pretty impressive work been done in a mere 5 days...

Tweet Engine

First up is us! Myself, Jens Klein, and Sasha Vincic teamed up to write Tweet Engine, a twitter webapp for collaborative tweeting. Many organisations - both companies and open source groups - have shared twitter accounts. Using these shared accounts, however, can be a huge pain, especially if you have multiple accounts to manage. The goal of Tweet Engine is to make this more manageable.

Anyone can sign up by logging in with their Google account. Once signed up, you can add any number of Twitter accounts. We use the Twitter OAuth library, which allows us to obtain permission from a user without prompting you for your password.

Once you've added an account, you can give any number of other people permission to use it. Access is configurable, including full administrator access, just the ability to send and view tweets, or just the ability to suggest tweets for review and approval. Once a suggestion is submitted, anyone with sufficient permissions can approve or decline it. Scheduled ...

ReferenceProperty prefetching in App Engine

This post is a brief interlude in the webapps on App Engine series. Fear not, it'll be back!

Frequently, we need to do a datastore query for a set of records, then do something with a property referenced by each of those records. For example, supposing we are writing a blogging system, and we want to display a list of posts, along with their authors. We might do something like this:

class Author(db.Model):
  name = db.StringProperty(required=True)

class Post(db.Model):
  title = db.TextProperty(required=True)
  body = db.TextProperty(required=True)
  author = db.ReferenceProperty(Author, required=True)


posts = Post.all().order("-timestamp").fetch(20)
for post in posts:
  print post.title
  print post.author.name

On the surface, this looks fine. If we look closer, however - perhaps by using Guido's excellent AppStats tool, we'll notice that each iteration of the loop, we're performing a get request for the referenced author entity. This happens the first time we dereference any ReferenceProperty, even if we've previously dereferenced a separate ReferenceProperty that points to the same entity!

Obviously, this is less than ideal. We're doing a lot of RPCs, and incurring a lot of ...

Webapps on App Engine part 3: Request handlers

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

Now that we've covered the background on request handling, it's time to tackle request handlers. Request handlers are the core and most obvious part of a web framework. They serve to simplify the writing of your app, and remove some of the boilerplate that you end up with if you write raw WSGI applications. Before we go any further, let's see what basic request handlers look like in a range of frameworks. Then we can discuss the pros and cons of each, and settle on one for ours.

Django:

def current_datetime(request):
    now = datetime.datetime.now()
    html = "It is now %s." % now
    return HttpResponse(html)

App Engine's webapp framework:

class MainPage(webapp.RequestHandler):
    def get(self):
        self.response.headers['Content-Type'] = 'text/plain'
        self.response.out.write('Hello, webapp World!')

Werkzeug:

@expose('/display/')
def display(request, uid):
    url = URL.query.get(uid)
    if not url:
        raise NotFound()
    return render_template('display.html', url=url)

web2py:

def hello1():
    return "Hello World"

Pylons:

class HelloController(BaseController):

    def index(self):
        # Return a rendered template
        #return render('/hello.mako')
        # or ...

Webapps on App Engine part 2: Request & Response handling

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

This post is mostly background on request and response encoding/decoding. If you're already fairly familiar with how this works in CGI and in higher level frameworks, you may want to skip this and wait for the next posting, on request handlers.

If you've ever written a CGI script, you'll be well aware of how much of a pain interpreting and decoding CGI environment variables and headers can be - so much so that the first action of many is to find or write a library to handle it for you! And if you've ever coded in a CGI-inspired language such as PHP, you're probably familiar with how much of a pain managing the combination of response headers and output content can likewise be.

As a result of this, one of the most basic tools a framework offers is some form of abstraction for request and response data. Typically, this takes care of collecting and parsing HTTP headers, parsing the query string (if any), decoding POSTed form data, and so forth. Better frameworks also ...