Distributed Transactions on App Engine

This is the fourth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

As promised, today we're going to discuss Distributed Transactions on App Engine. Distributed transactions are that feature that you didn't know you needed until they were gone: The ability to update any set of entities or records in your database in a single transaction. Due to the requirements of scale, App Engine supports transactions, but only on predefined Entity Groups. For most cases, this is all you need, but occasionally you really have to use a distributed or global transaction.

There are proposals for implementing distributed transactions on App Engine, but they're complicated, and none of them have yet been implemented as a robust library you can simply plug in and use. We're not going to attempt to recreate a general purpose distributed transaction system - at least, not today - instead, we'll address one common use-case for distributed transactions.

The usual example of the need for distributed or global transactions - so common that it's practically canonical - is the 'bank account' example. Suppose you have a set of accounts, defined something like this:

class Account ...

Advanced Bulk Loading, part 2: Customization

This is the third in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

Customizing Created Entities

Sometimes, it can be useful to modify an entity after it's created by the bulkloader, but before it's uploaded to App Engine. In our file upload example above, for instance, we might want to set the filesize field to the length of the uploaded file, but we don't want to add a new field to the data just to indicate that, since it's extra work, and could rapidly become inaccurate as files change.

Fortunately, the bulkloader provides an easy mechanism for this: The handle_entity method. By overriding this method, we can perform whatever postprocessing we wish on an entity before it's uploaded to the datastore. Here's an example that sets the filesize field:

class ImageLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'DatastoreImage', [('filename', str), ('data', file_loader) ]) def handle_entity(self, entity): entity.filesize = len(entity.data) return entity

As an added bonus, the handle_entity method is not restricted to returning a single entity: It may return a list of them. You can use this to generate additional 'computed ...

Efficient model memcaching

This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:

from google.appengine.api import memcache from google.appengine.ext import db entities = memcache.get("somekey") if not entities: entities = MyModel.all().fetch(10) memcache.set("somekey", entities)

This has several problems:

  • Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
  • Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
  • The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.

Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.

Two new functions were added to the ...

netboot.me: Turning 'netboot' into 'internetboot'




After a couple of weeks of concerted effort, I'm pleased to announce the initial release of a new service, netboot.me. Netboot.me takes regular netbooting and makes it a whole lot more versatile - now, you can netboot directly into the installers for many popular linux distros, as well as system tools and even live linux distributions, all directly over the Internet, and without any local configuration required!

All that's required to set up netboot.me is a spare writable CD, USB key, or floppy disk to write a small (less than 1MB) disk image to. Alternately, determined geeks can change their DHCP server to allow computers to netboot directly. Once you've done that, booting off the device on any computer with wired ethernet (wifi is a work in progress) will automatically cause the bootloader to download the current version of the menu from netboot.me, which you can then find the boot image you want to boot from. Selecting it causes the boot image to be downloaded and booted immediately.

Currently on the boot menu:
  • Installers for several popular linux distros (Ubuntu, Fedora, OpenSUSE, Debian).
  • The FreeBSD installer.
  • Tiny Core and Micro Core Linux ...

Bookeen CyBook vs Sony Reader

I've been a fan of EBook readers for a long time now. Before EPaper based readers, I had an LCD-based one, and before that I had a Palm that I used almost exclusively to read EBooks.

EPaper, though, is a big step forward in terms of EBook readability, and when the Bookeen CyBook first came out, I immediately got one.

Unfortunately, a couple of months ago, I pulled it out ofm y bag to use it, only to discover that I evidently hadn't treated it as carefully as it warranted, because the delicate EPaper screen had become damaged, to the point where it was no longer really usable.

For a while, that was it, no EBook reader, but when we travelled through the US recently, a store at one of the airports was selling the Sony Reader, and even had a discount on the PRS-505. The 505 has since been superseded by the touchscreen PRS-700, but I don't have much use for a touchscreen on an EBook reader, and aparrently the 700 has screen glare issues due to the touchscreen coating. I got the 505.
 
I didn't expect to be particularly impressed by it - I merely ...

BDBDatastore 0.2 released

I'm pleased to say that BDBDatastore 0.2 is now released. With this release, BDBDatastore is now officially at feature parity with the production App Engine datastore. That is, it ought to be able to do everything the production datastore can, which means you can port your apps off the production datastore without having to change them.

Installation instructions can be found here. The release is numbered 0.2, but if it proves stable enough, it will become the official 1.0 release of BDBDatastore. So treat it as beta at least until it's got a little more testing.

If you try it out, speak up! I'd like to hear what people think of this. In the meantime, I'm going to start working on writing a container for running App Engine apps on Apache and other HTTP servers, as well as doing load testing and profiling of BDBDatastore.

Interesting articles about SQL and non-relational databases

Here's an interesting article about why SQL databases suck for webapps, and another one with good detailed reviews of non-relational databases.

BDBDatastore 0.1 released

When I announced BDBDatastore just a few days ago, it was still a ways away from being practically usable for anyone wanting to develop or deploy App Engine apps. The purpose of the post was twofold: To attract some initial interest, and to motivate me, with the light of public scrutiny, to make sure it gets finished and polished.

I'm pleased to say that release 0.1 is now available. Version 0.1 brings BDBDatastore to parity with the feature set the App Engine datastore had on release day - that is to say, fully featured except for __key__ queries. Along with the server itself, I've also provided a patch to the App Engine SDK that allows you to tell the Python dev_appserver to use BDBDatastore for backend storage.

Full installation and usage instructions can be found on the wiki. Note that this release is still very much beta. It shouldn't break, but it might (and if it does, please let me know). It's also possible (likely, even) that the datastore will change in backwards-incompatible ways between now and 1.0.

As always, feedback and comments are appreciated.

Announcing BDBDatastore, a replacement datastore for App Engine

One criticism I frequently see directed at App Engine is that of lock-in. Since App Engine doesn't use the same APIs and libraries that people are used to using elsewhere, people say, Google is implicitly locking people in to continuing to run their App Engine apps on Google infrastructure.

I'm of two minds on this. On the one hand, I don't think it's justified to call this "lock-in" - Google has provided ample documentation of the runtime environment and the APIs available, and where documentation isn't available, the SDK source code is, so it's possible to figure out everything necessary to produce compatible interfaces with publicly available information alone, and without resorting to reverse engineering or any other gray areas.

On the other hand, while I don't think there's intentional lock-in, the lack of available alternatives amounts to practical lock-in. While moving your app off Google infrastructure would require implementing the new infrastructure yourself, this amounts to lock-in for the vast majority of people, who can't afford the time and resources required to implement such a thing themselves.

The key to making portability possible is the datastore. Of all the APIs App ...

Cognitive dissonance and the "download 4 free" debacle

I've been watching the whole Amazon/Pirate Bay debacle with some interest. Of particular interest is the number and type of critics of the whole thing: Some of the loudest critics seem to be those who would otherwise proudly admit to downloading pirate copies of media, or who like to go on about the dying business model of 'old media'.

I think the reason for this rather odd about-face is a pretty severe case of cognitive dissonance. It's fashionable to justify casual piracy as 'victim free' by pointing out that "I wouldn't have bought it anyway" or other similar justifications. And everyone loves Amazon for providing a way to get media legitimately and cheaply and nearly as conveniently as firing up a BitTorrent client. But when you combine the two with a direct link, suddenly the contradictions in holding both positions simultaneously become apparrent. Piracy is victimless... but you're explicitly passing up an opportunity to purchase it legitimately instead. Suddenly it feels a whole lot more like wandering into a shop and uplifting something*.

In a nutshell, I think most of the critics of the plugin aren't actually anti-piracy, per-se - they just don't like ...