Blogging on App Engine, part 4: Listings

This is part of a series of articles on writing a blogging system on App Engine. An overview of what we're building is here.

As you may have surmised from previous posts in the series, the 'static serving' approach we're using can lead to regenerating a lot of pages at once. For a long lived blog, with lots of history, regenerating the archive pages could take a significant amount of time - potentially long enough that we could run into the 30 second request deadline when updating or adding a post. Fortunately, however, we have something custom-made for the purpose: the Task Queue API. Using the Task Queue API, we can take care of the essential updates immediately - the post page itself, for example - then queue up other updates, such as the archive pages, on the task queue for later execution. Using the task queue has the extra advantage that updates can be executed in parallel.

Even better, we can make use of a new library in version 1.2.5 of the SDK, called 'deferred'. deferred is a clone of Ruby's delayed::job library, and makes it easy to enqueue function and method calls on the App Engine task queue. deferred isn't officially documented yet - though documentation is forthcoming soon - so consider this post to be an exclusive first look at this new library.

In order to use deferred, we need to add an extra stanza to app.yaml, defining a handler that the deferred library will use to execute tasks. Add the following immediately after the remote_api handler:

- url: /_ah/queue/deferred
  script: $PYTHON_LIB/google/appengine/ext/deferred/__init__.py
  login: admin

Once the handler is installed, using the deferred library is straightforward: Simply import google.appengine.ext.deferred, and call deferred.defer with the function and any arguments to pass to it. The function invocation will be serialized and placed on the task queue, where it will be executed just as any other task would be. Let's modify the render() method of BlogPost to use the deferred library. Replace this section:

      for dep in to_regenerate:
        generator_class.generate_resource(self, dep)

With this one:

      if generator_class.can_defer:
        for dep in to_regenerate:
          deferred.defer(generator_class.generate_resource, None, dep)
      else:
        for dep in to_regenerate:
          generator_class.generate_resource(self, dep)

The only change here is that we check if the ContentGenerator permits deferred execution. If it doesn't, we execute generate_resource as normal, but if it does, we call deferred.defer for each changed dependency. To make this work, we need to make a couple of small changes to our ContentGenerator classes. The first and most obvious is to add a 'can_defer' attribute to the class. Add this line to the beginning of the ContentGenerator class:

class ContentGenerator(object):
  # ...
  can_defer = True

Since we want the blog post itself to be regenerated synchronously, we need to modify the PostContentGenerator to set can_defer to False.

Next, we'll add a ContentGenerator for the Atom feed. This is very similar to the existing IndexContentGenerator code:

class AtomContentGenerator(ContentGenerator):
  """ContentGenerator for Atom feeds."""
 
  @classmethod
  def get_resource_list(cls, post):
    return ["atom"]

  @classmethod
  def get_etag(cls, post):
    return hashlib.sha1(db.model_to_protobuf(post).Encode()).hexdigest()

  @classmethod
  def generate_resource(cls, post, resource):
    import models
    q = models.BlogPost.all().order('-updated')
    posts = q.fetch(10)
    template_vals = {
        'posts': posts,
    }
    rendered = utils.render_template("atom.xml", template_vals)
    static.set('/feeds/atom.xml', rendered,
               'application/atom+xml; charset=utf-8')
generator_list.append(AtomContentGenerator)

Note that we're ordering by updated timestamp rather than by publication date, since we want our Atom feed to include the latest modifications to the blog. We're also basing the etag on the entire post rather than the summary, since we want the Atom feed to be regenerated upon any change to the post. The contents of atom.xml are omitted for brevity; you can see it here.

Finally, we want to add support for archive pages. Update the generate_resource method of IndexContentGenerator to the following (additions are highlighted):

  def generate_resource(cls, post, resource, pagenum=1, start_ts=None):
    assert resource == "index"
    import models
    q = models.BlogPost.all().order('-published')
    if start_ts:
      q.filter('published <=', start_ts)
    posts = q.fetch(config.posts_per_page + 1)
    more_posts = len(posts) > config.posts_per_page
    template_vals = {
        'posts': posts[:config.posts_per_page],
        'prev_page': '/page/%d' % (pagenum - 1) if pagenum > 1 else None,
        'next_page': '/page/%d' % (pagenum + 1) if more_posts else None,
    }
    rendered = utils.render_template("listing.html", template_vals)

    static.set('/page/%d' % (pagenum,), rendered, config.html_mime_type)
    if pagenum == 1:

      static.set('/', rendered, config.html_mime_type)
    if more_posts:
      deferred.defer(cls.generate_resource, None, resource, pagenum + 1,
                     posts[-1].published)

Here, we've modified the generate_resource function to take 'pagenum' and 'start_ts' arguments, so we can call it recursively to generate subsequent pages. We're then modifying the query to continue from where it left off, and adding extra arguments to the template variables for the current and previous pages of the archive, if applicable. Finally, we're rendering archive pages to /page/n URLs, and calling deferred.defer to render the next page if there are any.

As always, you can see a demo of the blog-so-far at http://bloggart-demo.appspot.com/, and the source is available here.

I'd also like to take this opportunity to explicitly ask for feedback: How is the series so far? Is it getting too dry and formulaic? Did you stop reading 2 posts ago? What can I do to improve it?

With these latest improvements, we have a basic, but functional blog. We're not going to quit here, however. In the next post, we'll tackle tagging, which will allow us to show off the real power of our dependency based content generation system.

Comments

blog comments powered by Disqus