Writing your own webapp framework for App Engine

Posted by Nick Johnson | Filed under python, coding, app-engine, tech, framework

Welcome back! I trust you all had a good holiday period? Mine was spent back in sunny New Zealand, seeing friends and family, visiting favorite restaurants, enjoying the sunshine, and learning to paraglide.

I would have started blogging again last week, but my first week back was made both exciting and frantically busy preparing for, then attending the BT Young Scientist Exhibition, where I gave tutorials on App Engine at the Google booth. But now, back to your regularly scheduled blogging...

Sometimes it seems like everyone has written their own blogging system, and everyone has written their own framework for webapps. That's not all these two things have in common, though: They're both excellent learning projects. Since you've already done the first, why not do the second? This will be the first of a series of posts covering how to write your own Python webapp framework. The framework, while targeted at App Engine, isn't exclusive to it.

As with the blogging project, it helps to set some goals before we get started. Here's our goals for this project:

Lightweight. With cold startup time being a significant concern for many, it's essential to avoid creating a large, monolithic framework which requires importing a lot of extraneous code before it can serve even basic requests. Likewise, we don't want users of our framework to have to add a great deal of overhead to their app.
Loosely coupled. Like all good frameworks, swapping out components should be easy, and nobody should be forced to use the entire framework if they only want part of it.
'Best-of-breed' components. As we'll see, there are already many good open source libraries that take care of individual tasks like routing, rendering templates, and session handling. Our framework should reuse these wherever possible.
Fast. This overlaps with the first two, but it's worth a bullet point of its own. There's a great deal of variability in the performance of some of the libraries available, and we should do our best to pick the fast ones.

One caveat, however. My main goal for this series is to introduce you to the inside workings of a web framework, and the things it interfaces with, such as WSGI and CGI. We're not writing something with the goal of being a serious competitor to all those other frameworks out there, so don't expect enterprise-level support or an active development beyond the series. If, like me, you love to hack around with this sort of thing, feel free to pick it up - fork it, use it, improve it! If, however, you're just looking for a ready-made framework to use with your next webapp, you're probably better off choosing one of the many existing frameworks with App Engine support.

Before we begin, we need to cover the basics of what a framework is, and how it works. If you're already familiar with things such as HTTP, CGI, and WSGI, feel free to skip over this section. Otherwise, read on!

HTTP

At the bottom of the stack is HTTP, which you should already be somewhat familiar with. HTTP is fundamentally a request-based protocol: A client makes a request, consisting of a URI path, a method, a set of headers, and an optional body, and the HTTP server sends back a response, likewise consisting of a set of headers and an optional body.

CGI

In order to be able to write an application that operates over HTTP, an interface of some sort is required, and that interface is CGI. CGI is a venerable standard by now, first standardized back in 1993, and it's a little archaic by today's standards. Although it (or systems based on it, like WSGI) is still used today, many of its design features are more than a little odd for today's webapps.

When a webserver receives a request that is to be handled by a CGI script, it transforms the request into a set of environment variables. The request method (Eg, GET, HEAD, POST, etc) is passed as the REQUEST_METHOD variable, while the URI path is split up into several components, which are passed separately: SCRIPT_NAME, the URI to the CGI script, PATH_INFO, the remainder of the URI path, and QUERY_STRING, the contents of the query string. The protocol being used is passed in via the 'HTTPS' variable, which is set to 'on' if HTTPS is in use, and 'off' otherwise. Other headers are transformed by making them all uppercase, and replacing hyphens with underscores - eg, 'Content-Type' becomes 'CONTENT_TYPE'. Since most webapps now don't actually use CGI scripts, the first variable, SCRIPT_NAME, is often left blank, or is set to the path to the application as a whole.

CGI was originally designed to formalize the handling of HTTP requests by standalone executables. The request would be transformed as we described above, then the script in question would be executed with those variables in its OS environment. The request body, if any, would be passed in on standard input. The CGI script is expected to process the request, and return the headers, followed by a newline and the response body, on standard out. This general pattern is preserved, with some modifications, on App Engine.

The main difference in App Engine's handling of CGI is a performance optimisation: instead of re-executing your Python executable for each request, a single Python runtime is used for multiple requests. Each request simply re-imports the main handler script for your app, which results in re-executing all the code in it, though modules loaded from it are not re-imported. As a further optimisation, if you provide a function called 'main' in your handler module, that function is executed on the second and subsequent requests, instead of re-importing the module.

WSGI

WSGI is a Python standard for interactions between servers and webapps. It utilizes components of the CGI standard, enhancing them in a manner that makes it easier to use in Python. Although it's not necessary to use WSGI with App Engine, it is convenient, and most framework libraries expect it, so that's what we'll be using.

WSGI operates by calling an 'application' function with a specific signature. The function is expected to take two arguments, 'environ' and 'start_response', and return an iterable of strings. The environ argument is a Python dict, containing a standard CGI environment, along with certain extra values specific to WSGI. The 'start_response' argument is a function, which itself takes two parameters, 'status' and 'response_headers'.

When called, a WSGI application is expected to do its stuff, then call start_response with the HTTP status code and a dictionary of HTTP headers to return to the client. Then, it should return or yield the body of the response as an iterable sequence of strings. Since App Engine doesn't support streaming responses, returning or yielding are equivalent, so we'll simply use return, for simplicity.

The way WSGI works permits very modular design of apps and frameworks. WSGI middleware is code that takes a WSGI application and 'wraps' it, transparently adding functionality. An example of WSGI middleware is beaker, a Python library that provides caching and session handling for WSGI applications.

Here's a simple WSGI application that prints 'hello world':

def application(environ, start_response):
  start_response(200, {'Content-Type': 'text/plain'})
  return ['Hello, world!']

As you can see, this is still a bit low level for your average webapp - hence our quest to write our own framework. There are several major components to a webapp framework, and we'll be covering them over the next couple of weeks:

In each part of the series, we'll discuss the currently available options for that part of the framework, then decide on a solution that best suits our specific needs, and implement it. In some cases we'll go with pre-made solutions, while in others we'll choose to implement it ourselves. In the most part, the decision of which of these to choose will rest on which teaches us the most about how it all works.

Look out for the first post, on request routing, on Wednesday!

18 January, 2010

Previous Post Next Post

Nick's Blog