Google I/O playlist, day 4: The BigQuery and Prediction APIs

This is the fourth in a series of posts providing a day-by-day playlist to help break up the Google I/O session videos - specifically the App Engine ones - into manageable chunks for those that haven't seen them.

Today's session isBigQuery and Prediction APIs. These are two awesome APIs that I described previously, and you can look forward to some forthcoming posts exploring how they work and what they can be used for.

This is another language-agnostic video - the APIs, by their nature, are pretty indifferent about what language you access them with. They both depend on Google Storage for their storage needs, so you should probably watch that talk first, though.

If you're only interested in one API or the other, the BigQuery talk starts at 6:15, and the Prediction API talk starts at 24:40. The whole talk is definitely worth watching, though.

Have something you'd particularly like to see demonstrated using the Prediction or BigQuery APIs in a future post? Leave a comment!

Google I/O day 2: BigQuery and Prediction APIs

First up this morning on the App Engine track is the BigQuery and Prediction APIs talk.

BigQuery

First up is BigQuery. BigQuery is a new API that lets you make use of Google's infrastructure for performing queries and analysis over large collections of read only data. It's designed to scale to massive datasets, and integrates well with App Engine and other platforms.

To use it, you start by uploading your data to the new Google Storage service. Then, you import it into BigQuery tables, and you can run queries on those tables. Despite the fact that it handles billions of rows of data, there's no need to explicitly define indexes, or to shard your data.

The syntax used to query should be familiar: It's based on SQL, and is extremely flexible. Using the example of the database of all Wikipedia revisions, getting the 5 most edited titles is as simple as:

SELECT TOP(title, 5), COUNT(*) FROM [bigquery.test.001/tables/wikipedia] WHERE wp_namespace = 0;

The speed has to be seen to be believed - response times from under a second to a few seconds for hundreds of millions to tens of billions of rows - seemingly regardless ...