Endings and beginnings

As of July 8th, I will no longer be working at Google.

The last four-and-a-half years at Google have been an incredible experience. First working as an SRE - Google's oncall / infrastructure role - and then as a Developer Programs Engineer, I've learned an incredible amount, worked with some amazing people, and done some exciting things. I've been with the Google App Engine team - first in a 20% capacity, and then full time - since before the product launched, and I think I've had a significant impact on the product and the community that's formed around it. I've greatly enjoyed interacting with everyone in the community and lending a hand where I can.

Ultimately, though, I've decided it's time for me to move on and seek out new challenges. With that in mind, I'm going to be working at Smart Sparrow, an exciting Sydney startup who are making waves in the education area. They're doing some really impressive stuff with E-Learning, and I'm excited to be joining them and helping shape things. Although I've worked at small companies before, this will be my first time at a for-real startup, and I ...

Hiatus

I've been blogging regularly or semi-regularly about App Engine on this blog since mid 2008, and I've been on the App Engine team for even longer, and I've decided it's time for a well-earned break.

With that in mind, starting January 30th, I'm taking eight weeks' leave from my job at Google, during which I'll be cycling the length of New Zealand! Not just on a regular bike, either, but one one of these. With any luck, I'll come back from my break revitalised and ready to bring more new and interesting things to the App Engine community.

As a result, I won't be updating this blog during my vacation. Nor am I likely to answer (m)any questions on Stack Overflow or respond promptly to email directed my way. My most capable and excellent colleagues, Amy Unruh, Ikai Lan and Johan Euphrosine will continue to take good care of the community in my absence.

I'll be blogging and vlogging my experiences riding around New Zealand while I'm gone on my new blog, Laid Back Touring, so for anyone interested in following my exploits, that is the place to look ...

There's nothing quite like...

Witness the product of an idle hour or two at work, and time spent pondering the meaninglessness of the cliche "There's nothing quite like...":

theresnothingquitelike.com

Powered by App Engine. Naturally.

Using the Channel API on App Engine for instant traffic analysis

In last week's post, we introduced Clio, a system for getting live insight into your site's traffic patterns, and we described how the Prospective Search API lets us filter the site's traffic to get just the records we care about.

This week, we'll cover the other part of the system: delivering results in real-time. For this, we'll be using the Channel API to stream new log entries to admin users in real-time. As with last week's post, where there's differences between our demo implementation and what you'd use in a real-world system, I'll point those out.

The admin interface

First up, we need to provide a simple admin interface to which we'll stream results. Here's the handler for that:

class IndexHandler(webapp.RequestHandler):
  """Serve up the Clio admin interface."""

  def get(self):
    client_id = os.urandom(16).encode('hex')
    channel_key = channel.create_channel(client_id)
    template_path = os.path.join(os.path.dirname(__file__),
                                 'templates', 'index.html')
    self.response.out.write(template.render(template_path, {
        'config': config,
        'client_id': client_id,
        'channel_key': channel_key,
    }))

The only thing of significance we do here relates to the Channel API. First, we generate a random client ID by getting some ...

Using the Prospective Search API on App Engine for instant traffic analysis

One of the really interesting new APIs released as part of App Engine recently is the Prospective Search API. Prospective search inverts the usual search paradigm, where you have a database of documents, and search queries match on those documents. In Prospective Search, you instead have a list of persistent search queries, and as new documents are created or updated, you match them against the queries. Twitter's live search interface is a good example of Prospective Search in action.

Today, in the first of a two post series, we'll be trying out the Prospective Search API with a sample application, Clio. Clio, named after muse of history, is designed to give administrators insight into the actual live traffic being served by their app. With it, you can see user request logs as they occur, and apply filters so you only see the hits that interest you - invaluable on a heavily trafficed site. Mystefied where people are getting to that 404 page from, and don't want to wait 12 hours for the analytics? Clio can help.

In this post, we'll go over the details of how to use the Prospective Search API to construct the server-side, query ...

A reminder

Just a quick reminder: If you've got topics you'd like me to write about, there's a feedback widget on the right hand side of the page. Submit and upvote suggestions there! A lot of the posts on this blog were based on feedback from users, and it's always more useful to be writing about something people want to hear about.

Demystifying the App Engine request logs

People often ask about the App Engine request logs shown in the admin console: What do all the fields mean? How should they be interpreted? They're actually fairly easy to read once you know the format, so here's a quick-reference to help.

The basic format is based on the Apache combined log format, which is so widely used even outside Apache that it's become a de-facto standard of sorts for webserver request logs. In addition to that, App Engine adds a number of extra fields logging additional, App Engine specific stats. Suppose you're examining the following log entry:

107.10.42.191 - foobiz [10/May/2011:17:26:28 -0700] "GET /page.html HTTP/1.1" 500 2297 "http://www.example.com/home.html" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4,gzip(gfe),gzip(gfe),gzip(gfe)" "www.example.com" ms=364 cpu_ms=23 api_cpu_ms=0 cpm_usd=0.001059

Here's what the fields mean, in order:

  1. Client's IP address (107.10.42.191)
  2. The RFC1413 identity of the client (in practice, always '-')
  3. The userid determined ...

Promises

One of the questions I got asked most at I/O was "when are you going to start blogging again"? I'm happy to be able to say that the answer is "really soon now". And yes, this time I really mean it.

I'm away on holiday for the next 10 days, but as soon as I get back to work, I'm back to blogging, on at least a weekly schedule - perhaps more often. I even have a couple of kick-ass blog posts lined up about some of the new features. If you have ideas of your own, post them to the issue tracker linked from the left.

In the meantime, the I/O session videos are well worth watching.

Storage options on App Engine

App Engine provides a number of ways for your app to store data. Some, such as the datastore, are well known, but others are less so, and all of them have different characteristics. This article is intended to enumerate the different options, and describe the pros and cons of each, so you can make more informed decisions about how to store your data.

Datastore

The best known, most widely used, and most versatile storage option is, of course, the datastore. The datastore is App Engine's non-relational database, and it provides robust, durable storage, as well as providing the most flexibility in how your data is stored, retrieved, and manipulated.

Pros

  • Durable - data stored in the datastore is permanent.
  • Read-write - apps can both read and write datastore data, and the datastore provides transaction mechanisms to enforce integrity.
  • Globally consistent - all instances of an app have the same view of the datastore.
  • Flexible - queries and indexing provide many ways to query and retrieve data.

Cons

  • Latency - because the datastore stores data on disk and provides reliability guarantees, writes need to wait until data is confirmed to be stored before returning, and reads often have to fetch data from disk.

Memcache

Memcache ...

Breaking things down to Atoms

Over the last few years, more and more sites have been providing RSS and Atom feeds. This has been a huge boon both for keeping up to date with content through feed readers, and for programmatically consuming data. There are, inevitably, a few holdouts, though. Notable amongst those holdouts -and particularly relevant to me at the moment - are property listing sites. Few, if any property listing sites provide any sort of Atom or RSS feed for listings, let alone a API of any kind.

To address this, I decided to put together a service for turning other types of notification into Atom feeds. I wanted to make the service general, so it can support many different formats, but the initial target will be email, since most property sites offer email notifications. The requirements for an email to Atom gateway are fairly straightforward:

  • Incoming email should be converted to entries in an Atom feed in such a fashion as to be easily interpretable in a feed reader
  • As much of the original message and metadata as possible should be preserved in the Atom feed entry
  • It should be possible to access the original message if desired

In addition, I had a ...