Google I/O day 2: BigQuery and Prediction APIs

First up this morning on the App Engine track is the BigQuery and Prediction APIs talk.

BigQuery

First up is BigQuery. BigQuery is a new API that lets you make use of Google's infrastructure for performing queries and analysis over large collections of read only data. It's designed to scale to massive datasets, and integrates well with App Engine and other platforms.

To use it, you start by uploading your data to the new Google Storage service. Then, you import it into BigQuery tables, and you can run queries on those tables. Despite the fact that it handles billions of rows of data, there's no need to explicitly define indexes, or to shard your data.

The syntax used to query should be familiar: It's based on SQL, and is extremely flexible. Using the example of the database of all Wikipedia revisions, getting the 5 most edited titles is as simple as:

SELECT TOP(title, 5), COUNT(*) FROM [bigquery.test.001/tables/wikipedia] WHERE wp_namespace = 0;

The speed has to be seen to be believed - response times from under a second to a few seconds for hundreds of millions to tens of billions of rows - seemingly regardless of the query.

The API is extremely simple, and based on JSON - see the docs for details.

Prediction API

The prediction API is machine learning - specifically, Supervised Classification - exposed as a web service. Just like with BigQuery, you start by uploading your data to Google Storage, and building/training your model offline. Once that's done, though, you can do predictions in real-time.

The input format is quite straightforward, and documented here. For text data, it's simply a CSV file of output 'labels' and input strings. The Machine learning API takes care of building a model based on that.

Once it's trained, doing a prediction request is as easy as you'd expect: Simply provide an element of data, such as a text string, and the API will use your trained model to predict which label should be associated with it.

I already have a couple of applications in mind for these APIs. Do you? Let me know what you'd like to see in the comments.

Comments

blog comments powered by Disqus