Writing a blog system on App Engine

I'm going to spend the next few posts working through the process of writing a simple, robust and scalable blogging system for App Engine. This first post is going to be fairly dull, unfortunately, as it will serve to cover what our requirements and non-requirements are, and the general approach we're going to take to achieve those objectives. In other words: Lots of bullet points, and little to no code. Don't worry, it'll get more exciting soon.

First, let's outline what we (or at least, I) expect out of a blogging system:

  • Simple authoring. I personally don't want to juggle with WYSIWYG HTML editors in order to write blog posts - I prefer to enter markup directly. But at the same time, I don't intend to make all potential users conform to the same expectations - so we should be able to support rich text editors if they're desired.
  • Good isolation of code and markup. Users shouldn't have to understand the innards of our blogging software if all they want to do is change how the blog looks or is laid out.
  • RSS and Atom support. This should go without saying, these days.
  • PubSubHubbub support. Let's be all Web 2.0 and push out blog updates instantly.
  • Sitemap support, so we can get indexed faster and more thoroughly.
  • Tagging, and filtering based on tag.
  • Easily support new output formats. Adding a new output format shouldn't require rewriting the whole system, and all our existing output formats - Atom, Sitemaps, and so forth - ought to use the same system.
  • Extensible. Just because I didn't need a feature doesn't mean nobody does.
  • Easy import of posts from other blogging systems.
  • Multiple author support.
  • Scheduled posts.
  • Fast. Really, really fast.

Not all of this is going to make it in, initially, since we're trying to keep things simple. All of it will be possible with the system we're building, though.

Of course, we need to explicitly state our non-requirements, too:

  • Comments. Commenting is a solved problem; we can easily plug in any of several commenting system such as Disqus. Such systems are both less effort and better written than anything we're likely to come up with ourselves.
  • Fulltext search. No matter how hard we try, we're not going to do it as well as a dedicated search engine. Instead of making our users tolerate our sub-standard search facility, let's go with the best and use site-search.

In order to meet these requirements, and in line with the App Engine strategy of "optimize for reads, not writes", we're going to take a "static serving" approach. Most apps cache the generated content to avoid regenerating it too frequently, but we'll take it one step further: We'll generate pages once at write-time, store them persistently in the datastore, and serve them up to users directly. This has several advantages:

  • Really fast code path for serving up all user-visible pages.
  • Much less likely that a bug in the blog can take it down for all users.
  • The code that serves user-visible pages can be only loosely coupled with the rest of the system, making extension and modification easy.
  • We can easily support HTTP caching and conditional request features to reduce bandwidth and CPU usage for both us and our users.
  • Low startup times on new server instances, since only the static serving code must be loaded.
  • Adding support for serving non-blog content, such as images and downloadable files, is trivial.

These advantages come at a cost, of course: We need to be able to figure out what pages need regenerating for a given change, which introduces some complexity into our code. Further, some changes may require regenerating significant portions of the site. Fortunately, the first issue is tractable with good design, and the second issue is nicely handled by the Task Queue.

So that's the plan - how are we going to achieve it? A piece at a time, of course:

  1. Serving static content from the datastore.
  2. Basic blogging: Writing and editing posts.
  3. Dependencies: Figuring out when to regenerate.
  4. Listing pages: Archives and Atom.
  5. Tagging and filtering.
  6. Disqus and Site-Search.
  7. Migrating to the new system.
  8. PubsubHubbub support.
  9. Sitemap support.
  10. ...more to come, no doubt.

We'll start with step 1 on Monday. In the meantime, there's one major thing missing from our nascent blogging system: A name. If you have a suggestion for what it should be called (hint: 'Bloog' is taken ;), post a comment and let me know.

Comments

blog comments powered by Disqus