Nick's Blog

Modeling relationships in App Engine

Posted by Nick Johnson | Filed under python, app-engine, datastore, relational-modelling

One source of difficulty for people who are used to relational databases - and certain ORMs in particular - is how to handle references and relationships on App Engine. There's two basic questions here: First, what does a relationship entail, in any database system? And second, how do we use them in App Engine?

The nature of relationships

Many ORMs expose multiple 'types' of relationships as first-class entities - one-to-one, one-to-many, and many-to-many. This obscures the fact that, in reality, they are all built on the same building block, that of references. A reference is simply a field of an entity that contains the key of another entity - for example, if a Pet references an Owner, that simply means the Pet has a field that contains the key of its owner.

All relationship types simply devolve to references. A one-to-many relationship is a reference in its simplest form - each Pet has one Owner, and therefore an Owner can have multiple pets, all pointing to it. The Owner is not modified; it relies on the individual Pets naming it as the owner. One-to-one relationships are one-to-many relationships with the additional constraint that there will be only one Pet referencing each Owner; this is ...

Under the hood with App Engine APIs

Posted by Nick Johnson | Filed under python, app-engine, coding, datastore

In the past, I've discussed various details of the way various App Engine APIs work under the hood. If you've used certain tools, such as Appstats, too, you probably already have a basic overview of how the App Engine APIs function. Today, we'll take a closer look at the interface between the App Engine runtime and its APIs and how it works, and learn what that means for the platform.

If you're interested in App Engine purely to write straightforward webapps, you can probably stop reading now. If you're interested in low-level optimisations, or in the platform itself, or you want to write a library or tool that tinkers with the innermost parts of App Engine, then read on!

The generic API interface

Ultimately, every API call comes down to a single generic interface, with 4 arguments: the service name (for example, 'datastore_v3' or 'memcache'), the method name (for example, 'Get' or 'RunQuery'), the request, and the response. The request and response components are both Protocol Buffers, a binary format widely used at Google for exchanging structured data between processes. The specific type of the request and response protocol buffers for an API call depend ...

Read more | Comments | 22 September, 2010

New in 1.3.6: Namespaces

Posted by Nick Johnson | Filed under python, app-engine, coding, namespaces

The recently released 1.3.6 update for App Engine introduces a number of exciting new features, including multi-tenancy - the ability to shard your app for multiple independent user groups - using a new Namespaces API. Today, we'll take a look at the Namespaces API and how it works.

One common question from people designing multi-tenant apps is how to charge users based on usage. While I'd normally recommend a simpler charging model, such as per user, that isn't universally applicable, and even when it is, it can be useful to keep track on just how much quota each tenant is consuming. Since multi-tenant apps just got a whole pile easier, we'll use this as an opportunity to explore per-tenant accounting options, too.

First up, let's take a look at the basic setup for namespacing. You can check out this demo for an example of what a fully featured, configurable namespace setup looks like, but presuming we want to use domain names as our namespaces, here's the simplest possible setup:

def namespace_manager_default_namespace_for_request():
  import os
  return os.environ['SERVER_NAME']

That's all there is to it. If we wanted to switch on Google Apps domain instead ...

Using BlobReader, wildcard subdomains and webapp2

Posted by Nick Johnson | Filed under app-engine, python, coding, blobstore, blobreader, webapp2

Today we'll demonstrate a number of new features and libraries in App Engine, using a simple demo app. First and foremost, we'll be demonstrating BlobReader, which lets you read Blobstore blobs just as you would a local file, but we'll also be trying out two other shinies: Wildcard subdomains, which allow users to access your app as anything.yourapp.appspot.com (and now, anything.yourapp.com), and Moraes' excellent new webapp2 library, a drop-in replacement for the webapp framework.

Moraes has built webapp2 to be as compatible with the existing webapp framework as possible, while improving a number of things. Improvements include an enhanced response object (based on the one in webob), better routing support, support for 'status code exceptions', and URL generation support. While the app we're writing doesn't require any of these, per-se, it's a good opportunity to give webapp2 a test drive and see how it performs.

But what are we writing, you ask? Well, to show off just how useful BlobReader is, I wanted something that demonstrates how you can use it practically anywhere you can use a 'real' file object - such as using it to read zip files from ...

Damn Cool Algorithms: Levenshtein Automata

Posted by Nick Johnson | Filed under python, tech, coding, damn-cool-algorithms

In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata.

Introduction

The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Further, due to the nature of FSAs, it will do so in O(n) time with the length of the string being tested. Compare this to the standard Dynamic Programming Levenshtein algorithm, which takes O(mn) time, where m and n are the lengths of the two input words! It's thus immediately apparrent that Levenshtein automaton provide, at a minimum, a faster way for ...

Getting unicode right in Python

Posted by Nick Johnson | Filed under python, text, unicode, rant

Yup, I'm back from holidays! Apologies to everyone for the delayed return - it's taking me a long while to catch up on everything that built up while I was away.

Proper text processing - specifically, correct handling of unicode - is one of those things that consistently confounds even experienced developers. This isn't because it's difficult, but rather, I believe, because most developers carry around a few key misconceptions about what text (in the context of software) is and how it's represented. A search on StackOverflow for UnicodeDecodeError demonstrates just how prevalent these misconceptions are. These misconceptions date back to the days before unicode - longer than many developers have been in the industry, including myself - but they're still nothing if not widespread. This is in part because a number of well known and popular languages continue to, at worst, perpetuate the misunderstandings, and at best are insufficiently good at helping developers get it right.

We can divide languages into four categories along the axis of unicode support:

Languages that were written before unicode was defined, or widespread. C and C++ fall into this category. Languages in this category tend to have unicode support that's spotty ...

Using Python magic to improve the deferred API

Posted by Nick Johnson | Filed under python, deferred, app-engine, coding, celery

Recently, my attention was drawn, via a blog post to a Python task queue implementation called Celery. The object of my interest was not so much Celery itself - though it does look both interesting and well written - but the syntax it uses for tasks.

While App Engine's deferred library takes the 'higher level function' approach - that is, you pass your function and its arguments to the 'defer' function - I've never been entirely happy with that approach. Celery, in contrast, uses Python's support for decorators (one of my favorite language features) to create what, in my view, is a much neater and more flexible interface. While defining and calling a deferred function looks like this:

def my_task_func(some_arg):
  # do something

defer(my_task_func, 123)

Doing the same in Celery looks like this:

@task
def my_task_func(some_arg):
  # do something

my_task_func.delay(123)

Using a decorator, Celery is able to modify the function it's decorating such that you can now call it on the task queue using a much more intuitive syntax, with the function's original calling convention preserved. Let's take a look at how this works, first, and then explore how we might make use of it ...

Guessing subreddits with the Prediction API

Posted by Nick Johnson | Filed under python, app-engine, prediction-api, google-storage

Edit: Now with a live demo!

I've written before about the new BigQuery and Prediction APIs, and promised to demonstrate them. Let's take a look at the Prediction API first.

The Prediction API, as I've explained, does a restricted form of machine learning, as a web service. Currently, it supports categorizing textual and numeric data into a preset list of categories. The example given in the talk - language detection - is a good one, but I wanted to come up with something new. A few ideas presented themselves:

Training on movie/book reviews to try and predict the score given based on the text
Training on product descriptions to try and predict their rating
Training on Reddit submissions to try and predict the subreddit a new submission belongs in

All three have promise, but the first could suffer from the fact that the prediction API as it currently stands doesn't understand a relationship between categories - it would have no way to know that the '5 star' rating tag is 'closer to' the '4 star' one than the '1 star' tag. The second seems very ambitious, and it's not clear there's enough information to do that ...

Using remote_api with OpenID authentication

Posted by Nick Johnson | Filed under python, openid, app-engine, remote-api

When we recently released integrated OpenID support for App Engine, one unfortunate side-effect for apps that enable it was disruption to authenticated, programmatic access to your App Engine app. Specifically, if you've switched your app to use OpenID for authentication, remote_api - and the remote_api console - will no longer work.

The bad news is that fixing this is tough: OpenID is designed as a browser-interactive authentication mechanism, and it's not clear what the best way to do authentication for command line tools like the remote_api console is going to be. Quite likely the solution will involve our OAuth support and stored credentials - stay tuned!

The good news, though, is that there's a workaround that you can use right now, without compromising the security of your app. It's a bit of a hack, though, so brace yourself!

The essential insight behind the hack is that if we can trick the SDK into thinking that it's authenticating against the development server instead of production, it will prompt the user for an email address and password, then send that email address embedded in the 'dev_appserver_login' cookie with all future requests. We can then use the email field to instead ...

Using OpenID authentication on App Engine

Posted by Nick Johnson | Filed under python, openid, app-engine, coding, clickpass

With the release of SDK 1.3.4, preliminary support is available for native OpenID authentication in App Engine. Today, we'll demonstrate how to use the new OpenID support in your app.

Edit: There's now an official article on OpenID on App Engine!

The first step in setting up OpenID authentication is to change your app's authentication settings. Log in to the admin console, select your app, and go to "Application Settings". There, you can pull down the "Authentication Options" box, and select "(Experimental) Federated Login".

Once you've enabled OpenID authentication for your app, a few things change:

URLs generated by create_login_url without a federated_identity parameter specified will redirect to the OpenID login page for Google Accounts.
URLs that are protected by "login: required" in app.yaml or web.xml will result in a redirect to the path "/_ah/login_required", with a "continue" parameter of the page originally fetched. This allows you to provide your own openid login page.
URLs generated by create_login_url with a federated_identity provider will redirect to the specified provider.

In order to make best use of this functionality, here's what we'll do:

Provide an OpenID login page on /_ah/login_required ...

The nature of relationships

The generic API interface

Introduction

Blogroll