Storage options on App Engine

App Engine provides a number of ways for your app to store data. Some, such as the datastore, are well known, but others are less so, and all of them have different characteristics. This article is intended to enumerate the different options, and describe the pros and cons of each, so you can make more informed decisions about how to store your data.

Datastore

The best known, most widely used, and most versatile storage option is, of course, the datastore. The datastore is App Engine's non-relational database, and it provides robust, durable storage, as well as providing the most flexibility in how your data is stored, retrieved, and manipulated.

Pros

  • Durable - data stored in the datastore is permanent.
  • Read-write - apps can both read and write datastore data, and the datastore provides transaction mechanisms to enforce integrity.
  • Globally consistent - all instances of an app have the same view of the datastore.
  • Flexible - queries and indexing provide many ways to query and retrieve data.

Cons

  • Latency - because the datastore stores data on disk and provides reliability guarantees, writes need to wait until data is confirmed to be stored before returning, and reads often have to fetch data from disk.

Memcache

Memcache ...

Breaking things down to Atoms

Over the last few years, more and more sites have been providing RSS and Atom feeds. This has been a huge boon both for keeping up to date with content through feed readers, and for programmatically consuming data. There are, inevitably, a few holdouts, though. Notable amongst those holdouts -and particularly relevant to me at the moment - are property listing sites. Few, if any property listing sites provide any sort of Atom or RSS feed for listings, let alone a API of any kind.

To address this, I decided to put together a service for turning other types of notification into Atom feeds. I wanted to make the service general, so it can support many different formats, but the initial target will be email, since most property sites offer email notifications. The requirements for an email to Atom gateway are fairly straightforward:

  • Incoming email should be converted to entries in an Atom feed in such a fashion as to be easily interpretable in a feed reader
  • As much of the original message and metadata as possible should be preserved in the Atom feed entry
  • It should be possible to access the original message if desired

In addition, I had a ...

Hello, developers.

Look at your app. Now back to me. Now back at your app. Now back to me.

Sadly, your app wasn't written by me. But if you used appstats, your app could be fast like mine.

Look down. Back up. Where are you? In your admin console, with the app your app could run like. What's in your hand? Back to me. It's a link from that site you love. Look again. Your site is now doing 20QPS.

Anything is possible when your app runs on App Engine and not PHP. I'm on a Python.

Anyone know a good voice actor?

Modeling relationships in App Engine

One source of difficulty for people who are used to relational databases - and certain ORMs in particular - is how to handle references and relationships on App Engine. There's two basic questions here: First, what does a relationship entail, in any database system? And second, how do we use them in App Engine?

The nature of relationships

Many ORMs expose multiple 'types' of relationships as first-class entities - one-to-one, one-to-many, and many-to-many. This obscures the fact that, in reality, they are all built on the same building block, that of references. A reference is simply a field of an entity that contains the key of another entity - for example, if a Pet references an Owner, that simply means the Pet has a field that contains the key of its owner.

All relationship types simply devolve to references. A one-to-many relationship is a reference in its simplest form - each Pet has one Owner, and therefore an Owner can have multiple pets, all pointing to it. The Owner is not modified; it relies on the individual Pets naming it as the owner. One-to-one relationships are one-to-many relationships with the additional constraint that there will be only one Pet referencing each Owner; this is ...

Under the hood with App Engine APIs

In the past, I've discussed various details of the way various App Engine APIs work under the hood. If you've used certain tools, such as Appstats, too, you probably already have a basic overview of how the App Engine APIs function. Today, we'll take a closer look at the interface between the App Engine runtime and its APIs and how it works, and learn what that means for the platform.

If you're interested in App Engine purely to write straightforward webapps, you can probably stop reading now. If you're interested in low-level optimisations, or in the platform itself, or you want to write a library or tool that tinkers with the innermost parts of App Engine, then read on!

The generic API interface

Ultimately, every API call comes down to a single generic interface, with 4 arguments: the service name (for example, 'datastore_v3' or 'memcache'), the method name (for example, 'Get' or 'RunQuery'), the request, and the response. The request and response components are both Protocol Buffers, a binary format widely used at Google for exchanging structured data between processes. The specific type of the request and response protocol buffers for an API call depend ...

An experiment in interactive blogging

First up, apologies for the recent inactivity here. I've had some pretty tough stuff to deal with in my personal life recently, and that's had to take priority over my blogging schedule. Rest assured that I do intend to return to a regular posting schedule, but in the short term, things may remain a little irregular. That said, there will be a post later this week, so it's not all doom and gloom.

Second, I'm now officially relocating to Sydney! Don't worry, I'll still be working on the App Engine team, but the Google office in Sydney has a substantial App Engine contingent, so I'll no longer be working alone in the Dublin office. It's also much, much closer to home for me, so I may get to see my family more than once a year or so. Finally, the weather is better. I expect to move early November, which will be another excuse for any raggedness in the blogging schedule.

Finally, the main subject of this post: I've decided to try a little experiment. I always have several post ideas in the pipeline, and readers often give me interesting new ...

New in 1.3.6: Namespaces

The recently released 1.3.6 update for App Engine introduces a number of exciting new features, including multi-tenancy - the ability to shard your app for multiple independent user groups - using a new Namespaces API. Today, we'll take a look at the Namespaces API and how it works.

One common question from people designing multi-tenant apps is how to charge users based on usage. While I'd normally recommend a simpler charging model, such as per user, that isn't universally applicable, and even when it is, it can be useful to keep track on just how much quota each tenant is consuming. Since multi-tenant apps just got a whole pile easier, we'll use this as an opportunity to explore per-tenant accounting options, too.

First up, let's take a look at the basic setup for namespacing. You can check out this demo for an example of what a fully featured, configurable namespace setup looks like, but presuming we want to use domain names as our namespaces, here's the simplest possible setup:

def namespace_manager_default_namespace_for_request():
  import os
  return os.environ['SERVER_NAME']

That's all there is to it. If we wanted to switch on Google Apps domain instead ...

Reduced posting schedule

As some of you will no doubt have already noticed, I've been posting here less frequently since I went on holiday than I previously was. Previously, my schedule was 3 posts a week - Monday, Wednesday, Friday. This let me turn out a lot of content, but it also consumed a lot of time.

Unfortunately, demands on my time are such that at the moment I can't keep that schedule up. Also, though there's still lots of App Engine and other posts I'd like to write, I'm not sure I can come up with 3 a week at the moment. For that reason, I'm going to be reducing the posting schedule, for now, to 1 post a week.

On the plus side, you can expect bigger, more detailed posts than was the average when I was posting 3 a week. An example of this would be the recent Damn Cool Algorithms post on Levenshtein Automata.

Also, due to unforseen problems with both of the posts in my pipeline currently, there'll be no post at all this week - but you can look forward to two posts next week, instead!

Using BlobReader, wildcard subdomains and webapp2

Today we'll demonstrate a number of new features and libraries in App Engine, using a simple demo app. First and foremost, we'll be demonstrating BlobReader, which lets you read Blobstore blobs just as you would a local file, but we'll also be trying out two other shinies: Wildcard subdomains, which allow users to access your app as anything.yourapp.appspot.com (and now, anything.yourapp.com), and Moraes' excellent new webapp2 library, a drop-in replacement for the webapp framework.

Moraes has built webapp2 to be as compatible with the existing webapp framework as possible, while improving a number of things. Improvements include an enhanced response object (based on the one in webob), better routing support, support for 'status code exceptions', and URL generation support. While the app we're writing doesn't require any of these, per-se, it's a good opportunity to give webapp2 a test drive and see how it performs.

But what are we writing, you ask? Well, to show off just how useful BlobReader is, I wanted something that demonstrates how you can use it practically anywhere you can use a 'real' file object - such as using it to read zip files from ...

Damn Cool Algorithms: Levenshtein Automata

In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata.

Introduction

The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Further, due to the nature of FSAs, it will do so in O(n) time with the length of the string being tested. Compare this to the standard Dynamic Programming Levenshtein algorithm, which takes O(mn) time, where m and n are the lengths of the two input words! It's thus immediately apparrent that Levenshtein automaton provide, at a minimum, a faster way for ...