Using the Prospective Search API on App Engine for instant traffic analysis

One of the really interesting new APIs released as part of App Engine recently is the Prospective Search API. Prospective search inverts the usual search paradigm, where you have a database of documents, and search queries match on those documents. In Prospective Search, you instead have a list of persistent search queries, and as new documents are created or updated, you match them against the queries. Twitter's live search interface is a good example of Prospective Search in action.

Today, in the first of a two post series, we'll be trying out the Prospective Search API with a sample application, Clio. Clio, named after muse of history, is designed to give administrators insight into the actual live traffic being served by their app. With it, you can see user request logs as they occur, and apply filters so you only see the hits that interest you - invaluable on a heavily trafficed site. Mystefied where people are getting to that 404 page from, and don't want to wait 12 hours for the analytics? Clio can help.

In this post, we'll go over the details of how to use the Prospective Search API to construct the server-side, query ...

A reminder

Just a quick reminder: If you've got topics you'd like me to write about, there's a feedback widget on the right hand side of the page. Submit and upvote suggestions there! A lot of the posts on this blog were based on feedback from users, and it's always more useful to be writing about something people want to hear about.

Demystifying the App Engine request logs

People often ask about the App Engine request logs shown in the admin console: What do all the fields mean? How should they be interpreted? They're actually fairly easy to read once you know the format, so here's a quick-reference to help.

The basic format is based on the Apache combined log format, which is so widely used even outside Apache that it's become a de-facto standard of sorts for webserver request logs. In addition to that, App Engine adds a number of extra fields logging additional, App Engine specific stats. Suppose you're examining the following log entry: - foobiz [10/May/2011:17:26:28 -0700] "GET /page.html HTTP/1.1" 500 2297 "" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4,gzip(gfe),gzip(gfe),gzip(gfe)" "" ms=364 cpu_ms=23 api_cpu_ms=0 cpm_usd=0.001059

Here's what the fields mean, in order:

  1. Client's IP address (
  2. The RFC1413 identity of the client (in practice, always '-')
  3. The userid determined ...