Migrating to Python 2.7, part 1: Threadsafe

Posted by Nick Johnson | Filed under python, threading, python-27, google-app-engine

With the recent experimental release of Python 2.7 support, many people are starting to move their apps from the 2.5 runtime to 2.7. In this series of posts, I'll go over the various considerations for migrating your app in detail, starting with the most immediately obvious - and arguably biggest impact - of them all: threadsafe and multithreaded Python apps.

As you're probably aware, the 2.7 runtime supports making your Python app multithreaded, meaning a single instance of the app may service multiple user requests at the same time. Due to the Global Interpreter Lock, a multithreaded Python app still has limited concurrency, but since most of the wallclock time of a typical App Engine app is spent waiting for RPCs - during which the GIL is not held - the parallelism, and corresponding improvements in utilization, can be substantial.

Moving to Python 2.7 and enabling multithreading is pretty straightforward. First, you have to update your app.yaml. Suppose the start of your app.yaml looks something like this:

application: myapp
version: main
runtime: python
api_version: 1

To switch to Python 2.7, you change it to this (changes are highlighted):

application: myapp
version: main
runtime: python27
api_version: 1
threadsafe: true

All we've done here is changed the runtime name from 'python' to 'python27', and added a new line, 'threadsafe: true', which should be fairly self-explanatory.

That's not quite it, though. Apps on the Python 2.5 runtime use a CGI-inspired model, meaning every app has some boilerplate in its handler that goes something like this:

application = webapp.WSGIApplication(...) # Or something else if you're not using webapp

def main():
  run_wsgi_app(application)

if __name__ == '__main__':
  main()

First we define a WSGI application in the 'application' variable - so far, so straight-forward, and the same as any other platform. Then, we define a main method, which runs the WSGI app. And finally, the mysterious "if __name__ == '__main__'" stanza, omission of which has caused many mysterious bugs for new App Engine programmers. The reason for this tells us a little about App Engine's original design, so I hope you'll permit me a slight diversion to explain why it worked this way, and why it's necessary to change it now. If you already know what CGI is and how it works, and how it's implemented in App Engine, feel free to skip ahead to the interesting stuff.

CGI: A historical perspective

Back when the web was first ramping up, there was no standard way to write a site that responded dynamically to user input, as opposed to just serving pre-written pages. In response, the Common Gateway Interface was defined as a standard way for web servers to call programs or scripts to process HTTP requests from users. In short, it pokes all the relevant information about the request into environment variables, then calls the CGI script that the user or the webserver administrator specified, feeding in the body of the HTTP request as input on the process's standard input stream. The CGI script is then expected to do whatever work it wants to, and write back the response to its standard output - first the headers, then the body of the response.

This was all well and fine, and it's a nice minimal, easy to implement interface, but it comes with a number of shotcomings, not least of which is that the script has to be loaded afresh for each request - and in the case of an interpreted script such as Python, the entire interpreter has to be loaded, and the script has to be parsed again. Needless to say, this imposes a lot of unnecessary overhead on the webserver, and the more traffic the server needs to handle, the more significant this is.

Over the years, a lot of solutions to this problem have been proposed and implemented, from the generic such as Fastcgi to the language-specific such as mod_php. WSGI is itself a response to this issue by the Python community, and it provides easy 'glue' for integration with CGI servers. None of the solutions, however, has ever been quite so universally supported as CGI.

When App Engine was first being written, then, the fine engineers at Google wanted to pick an option that was as widely supported as possible, but without paying the performance penalty of a true CGI app. When you know your app is going to be written in Python, you can get most of the way there pretty easily: simply load the Python interpreter once, and for each request, re-interpret only the handler script itself, after re-setting all the appropriate environment variables and so forth. Any modules it loads can stay in memory between requests, and unless a CGI app does some pretty unreasonable stuff, you get to avoid doing all the extra work of reloading the interpreter and all the modules on every request.

Your handler script - the one invoked directly by the webserver - is still being parsed, interpreted, and executed on every request, however, and that's still less than ideal. And yet, the way the CGI standard is specified, that's exactly how things should be. App Engine's designers wanted a way around this without breaking the CGI model, so they came up with a compromise. Here's how it goes: On the first request to each runtime, we'll run the handler script like we would any standard CGI script. After that request completes, though, we take a peek at the state of the interpreter at the end of the request. If the handler script defined a function called 'main', we do things differently; on subsequent requests, instead of re-importing the handler all over again from scratch, we simply execute main(). This is much faster, since it avoids all the overhead of re-parsing and interpreting the handler script on every request, and everyone's happy. Standard CGI scripts run normally, and code written with this in mind runs faster.

Incidentally, this is the reason why many people writing App Engine apps for the first time encounter strange issues with intermittently blank pages. The magic "if __name__ == '__main__':" stanza at the end is there to take care of the first page load: when the script is imported like a CGI script, it runs main(). Every other time, main is executed directly. If you don't write a main method or that stanza, App Engine assumes you're a standard CGI script, and all works as expected. If you write both, caching kicks in and everything is likewise fine. If you include a main but _don't_ include that stanza, though, on the first import your handler simply defines a bunch of stuff, then exits without running anything, resulting in a blank page!

Goodbye CGI, hello threading

CGI and threading, unfortunately, don't go well together. CGI relies on a lot of global state - the operating system environment and standard input and output - and doesn't react well to multiple requests trying to use that environment at the same time. This wasn't an issue for 'real' CGI scripts, of course, since each request was handled by a brand new process, but once threading was introduced for App Engine Python apps, it became an issue for the faux-CGI implementation used there.

The solution was to move to using WSGI directly. Under the Python 2.7 runtime, if you set "threadsafe: true" in your app.yaml, App Engine will, instead of using the old CGI-esque model, expect you to point it directly at a WSGI app, which it will execute directly, instead of via CGI. If you're already using a WSGI-based framework (and really, who isn't?), the changes for this are easy. First, delete your main function and the 'magic' "if __name__ == '__main__':" stanza, leaving just the application definition at the end of your module. Now, change the definition of your handler in app.yaml. What looked something like this:

- url: /.*
  script: main.py

Should now look something like this:

- url: /.*
  script: main.application

Simple! Now App Engine will execute your WSGI app directly, skipping the old CGI charade.

But what if my app makes use of 'os.environ' or other CGI-isms, you ask? All is not lost. The smart folks on the Python 2.7 runtime team foresaw that some apps will inevitably do this, and built in a workaround for them. 'os.environ' and a few other bits and pieces are now "thread local" on the 2.7 runtime, meaning that each thread (and hence each request) see a different copy of them, containing only the data relevant to the current request. Apps that expect to get request information from os.environ can thus continue to work fine. Bear in mind that this really is a workaround, though - it's definitely cleaner to rewrite your apps to rely only on the WSGI environment, if you have the opportunity. The WSGI environment can be accessed as self.request.environ if you're using webapp.

One quick note before you go any further: As of the time of writing, the App Engine SDK doesn't yet support the 2.7 runtime, so to test this you'll have to deploy to the production environment. Or, you could use the horrid-yet-ingenious hack documented here for limited support.

All done? Not so fast!

At this point, you may think you're done with migrating to a threaded app, and you're quite likely right. But first, a quick word about Global Mutable State. Now that your app may be serving multiple requests concurrently, you need to be careful the threads don't interfere with each other in unexpected ways. For many apps - most, even - this is a given, because they rely only on data local to the current request. Some apps, however, access global state, and if this state is mutable - that is, it can be changed by a request - you need to take care to synchronize access to it. Suppose, for example, that your app has some code like this:

class Configuration(db.Model):
  some_config_data = db.StringProperty()

  _config_cache = None

  @classmethod
  def get_config(cls):
    if not cls._config_cache:
      cls._config_cache = cls.get_by_key_name('config')
    return cls._config_cache

This is good design: there's only a single config object, we don't expect it to change often, and we don't want to waste time fetching it for every request, so we cache it against the class. Any time we need the config object, our code calls 'Configuration.get_config()', and the first request fetches it from the datastore, with subsequent calls fetching the cached object.

In a multithreaded environment, though, we have a race condition: two threads could call 'get_config()' at the same time, both conclude the config isn't cached yet, and independently fetch and store it. In this case, this isn't a big deal - it's just a little bit of wasted effort - and you could well just leave it as-is. There are plenty of other cases with shared state where it's important to get it right, though, so let's assume you want to fix it here.

The solution is simple, and provided by the threading module: Lock objects. A lock allows us to ensure that only one thread at a time can execute a particular chunk of code, and it's used like this (changes in bold):

class Configuration(db.Model):
  some_config_data = db.StringProperty()

  _config_cache = None
  _config_lock = threading.Lock()

  @classmethod
  def get_config(cls):
    with cls._config_lock:
      if not cls._config_cache:
        cls._config_cache = cls.get_by_key_name('config')
    return cls._config_cache

Here, the lock ensures that only one thread at a time can execute the code in get_config, and thus we'll only fetch the config once per instance. A more efficient implementation that avoids locking when it's definitely not necessary is left as an exercise for the reader.

That's it for this week's post; check back soon for more news about the Python 2.7 runtime!

26 October, 2011

Previous Post Next Post

Nick's Blog