A short hiatus

This Saturday, I'm heading off to a Greek island to spend a little under two weeks with my gorgeous wife and a collection of good friends in what I hope is a well-deserved holiday. Relaxation will be the order of the day, and as a result, I'm not going to be blogging much, if at all, during that period.

Not to fear, though: I have several good posts planned, including new Damn Cool Algorithms posts, and you can look forward to seeing them after I get back.

See you all in a couple of weeks!

Using Python magic to improve the deferred API

Recently, my attention was drawn, via a blog post to a Python task queue implementation called Celery. The object of my interest was not so much Celery itself - though it does look both interesting and well written - but the syntax it uses for tasks.

While App Engine's deferred library takes the 'higher level function' approach - that is, you pass your function and its arguments to the 'defer' function - I've never been entirely happy with that approach. Celery, in contrast, uses Python's support for decorators (one of my favorite language features) to create what, in my view, is a much neater and more flexible interface. While defining and calling a deferred function looks like this:

def my_task_func(some_arg):
  # do something

defer(my_task_func, 123)

Doing the same in Celery looks like this:

@task
def my_task_func(some_arg):
  # do something

my_task_func.delay(123)

Using a decorator, Celery is able to modify the function it's decorating such that you can now call it on the task queue using a much more intuitive syntax, with the function's original calling convention preserved. Let's take a look at how this works, first, and then explore how we might make use of it ...

Guessing subreddits with the Prediction API

Edit: Now with a live demo!

I've written before about the new BigQuery and Prediction APIs, and promised to demonstrate them. Let's take a look at the Prediction API first.

The Prediction API, as I've explained, does a restricted form of machine learning, as a web service. Currently, it supports categorizing textual and numeric data into a preset list of categories. The example given in the talk - language detection - is a good one, but I wanted to come up with something new. A few ideas presented themselves:

  1. Training on movie/book reviews to try and predict the score given based on the text
  2. Training on product descriptions to try and predict their rating
  3. Training on Reddit submissions to try and predict the subreddit a new submission belongs in

All three have promise, but the first could suffer from the fact that the prediction API as it currently stands doesn't understand a relationship between categories - it would have no way to know that the '5 star' rating tag is 'closer to' the '4 star' one than the '1 star' tag. The second seems very ambitious, and it's not clear there's enough information to do that ...

Using remote_api with OpenID authentication

When we recently released integrated OpenID support for App Engine, one unfortunate side-effect for apps that enable it was disruption to authenticated, programmatic access to your App Engine app. Specifically, if you've switched your app to use OpenID for authentication, remote_api - and the remote_api console - will no longer work.

The bad news is that fixing this is tough: OpenID is designed as a browser-interactive authentication mechanism, and it's not clear what the best way to do authentication for command line tools like the remote_api console is going to be. Quite likely the solution will involve our OAuth support and stored credentials - stay tuned!

The good news, though, is that there's a workaround that you can use right now, without compromising the security of your app. It's a bit of a hack, though, so brace yourself!

The essential insight behind the hack is that if we can trick the SDK into thinking that it's authenticating against the development server instead of production, it will prompt the user for an email address and password, then send that email address embedded in the 'dev_appserver_login' cookie with all future requests. We can then use the email field to instead ...

Google I/O playlist, day 8: Next gen queries

This is the eighth in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them. Don't worry, we're nearly done!

First up, apologies for the lack of posts the last few days. I spent the weekend at Hack Camp, so Friday was spent preparing for it, and Monday and Tuesday were spent recovering and catching up!

Today's session is Alfred Fuller's session, Next Gen Queries. If you're interested in the continued evolution of the datastore API, this is a must-watch talk. It's completely language-agnostic, too. If you're looking for better background knowledge, Brett's talk from I/O '09 is well worth watching first, though.

Alfred goes into a lot of detail about some of the exciting improvements you can expect in future iterations of the Datastore API, largely due to his work on extending and improving the ZigZag merge join strategy in App Engine, as well as better support for MultiQuery. As an extra bonus, check out 31:29 for details of an improvement you can take advantage of ...

Google I/O playlist, day 7: Data pipelines with Google App Engine

This is the seventh in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's session is Brett Slatkin's talk on Data Pipelines with Google App Engine. This is a really fascinating talk, and it's a must-watch for anyone wanting to do advanced, high-throughput processing on App Engine. Brett describes in detail, with working code, a couple of high level concepts that make it possible to do 'eventually consistent' processing on App Engine, with one of the goals being Materialized Views.

The examples are in Python, but the talk will be useful for Java developers too - it's the concepts that are really important here.

Google I/O playlist, day 6: Batch data processing with App Engine

This is the sixth in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's video is Mike Aizatsky's Batch data processing with App Engine, where he describes the recently-released mapper framework for App Engine. I blogged previously about this framework, giving a detailed breakdown of the demo mapreduce used in this very talk!

The mapper API is initially being released for Python, so it'll be mostly of interest to Python users - but Java is coming soon, so it's well worth watching even if you only speak Java.

Google I/O playlist, day 5: Data migration in App Engine

This is the fifth in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's video is Data migration in App Engine, Matthew Blain's talk on new improvements to the bulkloader. I've blogged about the new bulkloader previously, but Matthew's talk goes into a lot more detail.

Matt starts talking about the new configuration format at 6:38, if you want to skip the intro.

Google I/O playlist, day 4: The BigQuery and Prediction APIs

This is the fourth in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's session isBigQuery and Prediction APIs. These are two awesome APIs that I described previously, and you can look forward to some forthcoming posts exploring how they work and what they can be used for.

This is another language-agnostic video - the APIs, by their nature, are pretty indifferent about what language you access them with. They both depend on Google Storage for their storage needs, so you should probably watch that talk first, though.

If you're only interested in one API or the other, the BigQuery talk starts at 6:15, and the Prediction API talk starts at 24:40. The whole talk is definitely worth watching, though.

Have something you'd particularly like to see demonstrated using the Prediction or BigQuery APIs in a future post? Leave a comment!

Google I/O playlist, day 3: What's hot in Java for App Engine

This is the third in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's video is "What's hot in Java for App Engine" by Don Schwarz and Toby Reyelts. It provides an overview of the first year of Java support on App Engine, focusing on a demo app that shows off a number of features of the Java runtime.

At first glance, this is definitely a talk for the Java programmers amongst us, and it certainly has a lot of content on those lines; the demo shows off to good advantage a number of the App Engine APIs. The secret hidden surprise, though, is something that will be of interest to nearly everyone: Details about the forthcoming channel API. The channel API implements the promised support for Comet on App Engine. For the juicy details, jump to 10:49, and keep watching up to 15:10. There's more details in Moishe's talk on building Real-time Webapps in App Engine, the video for which will be going up later today, and which ...