Using BlobReader, wildcard subdomains and webapp2

Posted by Nick Johnson | Filed under app-engine, python, coding, blobstore, blobreader, webapp2

Today we'll demonstrate a number of new features and libraries in App Engine, using a simple demo app. First and foremost, we'll be demonstrating BlobReader, which lets you read Blobstore blobs just as you would a local file, but we'll also be trying out two other shinies: Wildcard subdomains, which allow users to access your app as anything.yourapp.appspot.com (and now, anything.yourapp.com), and Moraes' excellent new webapp2 library, a drop-in replacement for the webapp framework.

Moraes has built webapp2 to be as compatible with the existing webapp framework as possible, while improving a number of things. Improvements include an enhanced response object (based on the one in webob), better routing support, support for 'status code exceptions', and URL generation support. While the app we're writing doesn't require any of these, per-se, it's a good opportunity to give webapp2 a test drive and see how it performs.

But what are we writing, you ask? Well, to show off just how useful BlobReader is, I wanted something that demonstrates how you can use it practically anywhere you can use a 'real' file object - such as using it to read zip files from the blobstore using the (native code) zipfile module. So, we'll build a simple app that lets users upload zips of static content, and serve them on a custom subdomain.

Let's get started. We'll begin by defining our data model:

BASE_DOMAIN = "%s.appspot.com" % os.environ['APPLICATION_ID']


class Site(db.Model):
  owner = db.UserProperty(required=True)
  last_updated = db.DateTimeProperty(required=True, auto_now=True)
  zipfile = blobstore.BlobReferenceProperty(required=True)

  @property
  def url(self):
    return '%s.%s' % (self.key().name(), BASE_DOMAIN)

Simple enough. Our Site entities will use the subdomain as the key name, and we store the user that owns the site, when it was last updated, and a reference to the zip they uploaded. Next, let's define a base request handler, and a main page handler, which will list the user's sites and let them update them or upload a new one:

class BaseHandler(webapp2.RequestHandler):
  def __call__(self, *args, **kwargs):
    self.user = users.get_current_user()
    if not self.user:
      self.redirect(users.create_login_url(self.request.url))
    else:
      return super(BaseHandler, self).__call__(*args, **kwargs)

  def render_template(self, file, template_vars):
    path = os.path.join(os.path.dirname(__file__), 'templates', file)
    self.response.out.write(template.render(path, template_vars))


class MainHandler(BaseHandler):
  def get(self):
    sites = Site.all().filter('owner =', self.user).fetch(20)
    self.render_template('index.html', {
        'sites': sites,
        'upload_url': self.url_for('upload'),
    })

This should look eerily familiar - as I've said, webapp2 imitates the webapp framework fairly closely. One item of note is our overriding of __call__: unlike webapp, webapp2 does its own dispatch, which means that we can do 'middleware' type things such as checking the current user by overriding that functionality in our subclass, rather than needing to wrap each method in a decorator. Otherwise, everything we do here is no different to how it would work in webapp. We won't include the templates here, since they're pretty straightforward - see the full code if you're interested.

Next, we should define an upload handler. The part to render the upload form is pretty simple:

class UploadHandler(BaseHandler):
  def get(self):
    self.render_template('upload.html', {
        'upload_url': blobstore.create_upload_url(self.url_for('upload')),
        'site_name': self.request.GET.get('site', None),
    })

The only item of note here is our use of webapp2's support for URL generation by calling the 'url_for' method, which generates a URL given the name of the route (which we'll define later). The code to process the upload is a little more complicated, since we need to handle a few edge cases: handling both new sites and updates to existing ones, as well as making sure people can't update others's sites, but it's still fairly straightforward:

  def post(self):
    site = self.request.POST['site']
    blob_key = blobstore.parse_blob_info(self.request.POST['file'])
    db.run_in_transaction(self.upload_tx, site, blob_key)
    self.redirect_to('main')

  def upload_tx(self, site_name, blob_key):
    site = Site.get_by_key_name(site_name)
    if site:
      if site.owner != self.user: return
      site.zipfile.delete()
      site.zipfile = blob_key
    else:
      site = Site(key_name=site_name, owner=self.user, zipfile=blob_key)
    db.put(site)

Note the use of the redirect_to method - this is syntactic sugar for self.redirect(self.url_for('main')).

Finally, we should define the handler that actually serves the sites up. This is where the important stuff happens:

INDEX_FILES = ['index.html', 'index.htm']
SUBDOMAIN_RE = re.compile("^([^.]+)\.%s\.appspot\.com$"
                          % os.environ['APPLICATION_ID'])


class SiteHandler(webapp2.RequestHandler):
  def get(self, path):
    site_name = SUBDOMAIN_RE.search(self.request.host).group(1)
    site = Site.get_by_key_name(site_name)
    if not site:
      self.abort(404)
    zip_key = Site.zipfile.get_value_for_datastore(site)
    site_zip = zipfile.ZipFile(blobstore.BlobReader(zip_key))

    path, data = self.get_contents(site_zip, path)
    self.response.headers['Content-Type'] = mimetypes.guess_type(path)[0]
    self.response.out.write(data)

First, we extract the subdomain that was used, and look up the site entry from that. Then, we extract the blob key of the zipfile, and pass that to blobstore.BlobReader, which we pass in turn to zipfile.ZipFile, returning a ZipFile object we can use to read individual files from the zip. Then we call a method, self.get_contents, where the important bits happen:

  def get_contents(self, site_zip, path):
    if path.endswith('/'):
      for idx in INDEX_FILES:
        newpath = os.path.join(path, idx)[1:]
        try:
          data = site_zip.read(newpath)
          return newpath, data
        except KeyError:
          pass
      self.abort(404)
    else:
      try:
        return path, site_zip.read(path[1:])
      except KeyError:
        self.abort(404)

If the path ends with a trailing slash, it's a directory, so we check the zip for the existence of various common index files - index.html, index.htm - and return those. Otherwise, we simply fetch the requested file and return it to the caller. Easy - just as if it really were a local file. Finally, back in the get() method of the handler, we use the mimetypes module to guess the correct content type to return, and send the data back to the user.

There's still a little more glue required to finish things up. First, we need to define webapps, with the appropriate routes, for the admin site and for the sub-sites:

main_app = webapp2.WSGIApplication([
    webapp2.Route(r'/', MainHandler, name='main'),
    webapp2.Route(r'/upload', UploadHandler, name='upload'),
])


site_app = webapp2.WSGIApplication([
    webapp2.Route(r'<path:.*>', SiteHandler),
])

This is very similar to how webapp handles things - and indeed, you can even use regular webapp routes if you want - but the Route class has more flexibility, including naming the handlers, so we can use URL building as we demonstrated above. Also note the use of named groups for the site_app, to extract the path and provide it as an argument to our SiteHandler's get method.

Next, we need some way to direct requests to the appropriate webapp depending if the request is for the bare domain or a subdomain. For that, we'll write a simple piece of middleware:

def domain_middleware(domain_map):
  domain_map = [(re.compile('^%s$' % x) if isinstance(x, basestring) else x, y)
                for x, y in domain_map]
  def middleware(environ, start_response):
    domain = environ['SERVER_NAME']
    for regex, app in domain_map:
      if regex.match(domain):
        return app(environ, start_response)
  return middleware

The domain_middleware function accepts a list of tuples, corresponding to regular expressions used to match domains, and the WSGI app that matching requests should be sent to. Look out for integrated support for this in webapp2 in the near future! In the meantime, here's our own middleware in action:

app = domain_middleware([
    (SUBDOMAIN_RE, site_app),
    ('.*', main_app),
])


def main():
  run_wsgi_app(app)


if __name__ == '__main__':
  main()

And that's all there is to it. To see it in action, check out http://pydocs.zip-site.appspot.com/, where I've uploaded the pydocs for Python 2.7, or see the admin interface at http://zip-site.appspot.com/. The full source code is online here.

05 August, 2010

Previous Post Next Post

Nick's Blog

Using BlobReader, wildcard subdomains and webapp2

Comments

Blogroll