Webapps on App Engine part 1: Routing

This is part of a series on writing a webapp framework for App Engine Python. For details, see the introductory post here.

The first part of a framework you encounter when using one is, more often than not, the routing code. With that in mind, it's what we'll be tackling first. There are several approaches to handling request routing, and we'll go on a quick tour of the libraries and approaches before we decide on one and implement it.

The built-in App Engine webapp framework takes an extremely straightforward approach: The incoming request's URL is compared to a list of regular expressions in order, and the first one that matches has the corresponding handler executed. As an enhancement, any captured groups in the regular expression are passed to the request handler as additional arguments. You can see the code that does all this here - it's extremely straightforward and easy to follow.

The webapp module does one thing that I'm not a huge fan of: It ties the request routing in with handling the requests. The same function that finds the appropriate handler for a request also takes care of parsing the request and calling ...

Writing your own webapp framework for App Engine

Welcome back! I trust you all had a good holiday period? Mine was spent back in sunny New Zealand, seeing friends and family, visiting favorite restaurants, enjoying the sunshine, and learning to paraglide.

I would have started blogging again last week, but my first week back was made both exciting and frantically busy preparing for, then attending the BT Young Scientist Exhibition, where I gave tutorials on App Engine at the Google booth. But now, back to your regularly scheduled blogging...

Sometimes it seems like everyone has written their own blogging system, and everyone has written their own framework for webapps. That's not all these two things have in common, though: They're both excellent learning projects. Since you've already done the first, why not do the second? This will be the first of a series of posts covering how to write your own Python webapp framework. The framework, while targeted at App Engine, isn't exclusive to it.

As with the blogging project, it helps to set some goals before we get started. Here's our goals for this project:

  • Lightweight. With cold startup time being a significant concern for many, it's essential to avoid creating ...

Merry Season!

As you've probably guessed by now, I'm not posting over the holiday period. Expect new content, including an all new series of posts, in the new year, however!

In the meantime, enjoy your time with your families, if that's what you're doing. As for myself and my wife, Hayley, we're back in New Zealand, enjoying some quality time with our families.

OpenID on App Engine made easy with AEoid

I'm pleased to present AEoid, a new App Engine library that aims to make user authentication with OpenID on App Engine simple.

AEoid is extremely easy to install, comprising a single piece of WSGI middleware, and its interface mirrors that of the App Engine Users API. Here's an example of it in action:

from aeoid import users

class TestData(db.Model):
  user = users.UserProperty()

class TestHandler(webapp.RequestHandler):
  def get(self):
    user = users.get_current_user()
    if not user:
      self.redirect(users.create_login_url(self.request.url))
      return
    logging.warn("Logged in as %s (%s)", user.nickname(), user.user_id())
    data = TestData(user=user)
    data.put()

As you can see, the interface to AEoid is almost exactly identical to the App Engine Users API. There a few differences of note:

  • Users are identified uniquely by their OpenID endpoint.
  • You can't construct a User object without specifying an OpenID URL.
  • Nicknames and email addresses are user-supplied, so they're not guaranteed unique or validated.
  • is_current_user_admin() is not yet implemented.
  • login: clauses in app.yaml are not affected by AEoid - they still authenticate using the regular Users API.

Installation

Installing AEoid is a simple matter of adding its WSGI middleware to your app ...

No post today

Unfortunately, I've been caught short today by a flood of important things at work, and didn't get a chance to write up my regular Wednesday blog post. In place of that, I offer you the winners of the App Engine USB drives:

  • Jeff
  • Rodrigo Moraes
  • Olivier Deckmyn

If you are reading this, and you're one of the winners, please send me an email (nick AT notdot DOT net) with your name, address, and favorite Google color (Yellow, Green, Red, or Blue), and I'll get them out to you pronto.

As further tribute, I offer you this adorable picture of a kitten:

(Yes, I'm aware of how paradoxical the subject of this post is)

'Naked' domains on App Engine

One topic that comes up frequently on the App Engine groups is that of 'naked' domains in App Engine, and how to handle them. A naked domain, for the uninitiated, is one without a service-specific subdomain. For example, "google.com" is naked, while "www.google.com" is not. This post provides an overview of why naked domains are a problem, and what you can do about them.

There are two separate factors that combine to make handling of naked domains a problem in App Engine. The first is the design of DNS, the system for resolving domain names to IP addresses. There are two different types of DNS record we're concerned about here: A records, which specify the IP address for a name, and CNAME records, which acts a "see also", specifying another name for a domain. For example, an A record might say "google.com has the IP 216.239.59.104", while a CNAME record might say "google.com is also known as www.l.google.com".

The problem arises with the way CNAME records work. An A record specifies the IP address only for a single record - for example, an A record on google.com specifies ...

'Most popular' metrics in App Engine

One useful and common statistic to provide to users is a metric akin to "7 day downloads" or "7 day popularity". This appears across many sites and types of webapp, yet the best way to do this is far from obvious.

The naive approach is to record each download individually, and use something akin to "SELECT count(*) FROM downloads WHERE item_id = 123 AND download_date > seven_days_ago", but this involves counting each download individually - O(n) work with the number of downloads! Caching the count is an option, but still leads to excessive amounts of work at read-time.

Another option is to maintain an array of daily download counts, keeping the last 7. This is an improvement from a workload point of view, but leads to either discontinuities at the start of a new day, or to all counts being updated only once per day.

There is a third option, however, which has the performance of the second option, with the responsiveness of the first. To use it, however, we have to reconsider slightly what we mean by '7 day popularity'. The solution in question is to use an exponential decay process. Each time an event happens, we increase the item's ...

.astronomy so far

The first 3 days of .astronomy have been busy. So busy, I haven't had the time or energy to write about them until now! Here's a quick summary of what's happened:

I arrived at the conference a bit before 11AM on Monday, having taken the earliest flight available that day (I didn't fly in the night before, as that would've meant missing Video Games Live). When I came in, Robert Hollow was giving a talk on pulse@parkes, a fascinating program he runs to get students into astronomy by giving them real observing time on the Parkes radio telescope in Australia. He gave an engaging presentation, and made me wish I could give it a go myself.

Next up, Arfon Smith and Chris Lintott gave a talk on Galaxy Zoo, which has come a long way since I last looked at it. They described the architecture (Ruby, running on AWS), some of the project's successes, and some of the new projects they're working on, including Galaxy Zoo Mergers, a project dedicated to determining and documenting the details of galaxy mergers.

After lunch, I gave my Python 101 tutorial. Due to a screw-up on ...

They're almost here!

The App Engine USB drives - in bright primary, Google(tm) colors - are finished! They're currently winging their way to the Dublin office (and a separate batch direct to the .astronomy venue. Can't wait to get my hands on them.

Want to get your hands on one of them too? Post a suggestion for what topic you'd like to see me write about - be it App Engine, Go, Damn Cool Algorithms, or something else - and I'll send a USB drive, loaded with App Engine goodies and a Wave invite, to the authors of the best few suggestions.

I'm going to be at .astronomy all next week, so I'm not going to be putting up new posts on my regular schedule. I will, however, be blogging about the conference, so look out for posts on some of the more interesting talks and breakout sessions/hackathons.

In one final, unrelated item, I'd like to draw your attention to an amazing bit geekery. Last week, I posted this code golf competition to Stack Overflow, for the shortest Fractran interpreter. As an extra challenge, I offered a bonus to anyone who could provide a Fractran interpreter in fractran ...

Enforcing data isolation with CurrentDomainProperty

In a previous post, we described how to implement API call hooks, and demonstrated a common use-case: Separating the datastore by domain, for multi-tenant apps.

It's not always the case that you want to partition your entire datastore along domain or user lines, however. Sometimes you may want to have only some models with restricted access per-domain, with others being common across all domains. You might also want a way to ensure that users can't read or modify each others' data. Fortunately, there's a way to implement all this at a higher level: Instead of defining API call hooks, we can define custom datastore properties to do the job for us.

Here's an implementation of a CurrentDomainProperty:

class InvalidDomainError(Exception):
  """Raised when something attempts to access data belonging to another domain."""


class CurrentDomainProperty(db.Property):
  """A property that restricts access to the current domain."""

  def __init__(self, allow_read=False, allow_write=False, *args, **kwargs):
    self.allow_read = allow_read
    self.allow_write = allow_write
    super(CurrentDomainProperty, self).__init__(*args, **kwargs)

  def __set__(self, model_instance, value):
    if not value:
      value = unicode(os.environ['HTTP_HOST'])
    elif (value != os.environ['HTTP_HOST'] and not self.allow_read
          and not users.is_current_user_admin()):
      raise InvalidDomainError(
          "Domain '%s' attempting ...