Announcing BDBDatastore, a replacement datastore for App Engine

One criticism I frequently see directed at App Engine is that of lock-in. Since App Engine doesn't use the same APIs and libraries that people are used to using elsewhere, people say, Google is implicitly locking people in to continuing to run their App Engine apps on Google infrastructure.

I'm of two minds on this. On the one hand, I don't think it's justified to call this "lock-in" - Google has provided ample documentation of the runtime environment and the APIs available, and where documentation isn't available, the SDK source code is, so it's possible to figure out everything necessary to produce compatible interfaces with publicly available information alone, and without resorting to reverse engineering or any other gray areas.

On the other hand, while I don't think there's intentional lock-in, the lack of available alternatives amounts to practical lock-in. While moving your app off Google infrastructure would require implementing the new infrastructure yourself, this amounts to lock-in for the vast majority of people, who can't afford the time and resources required to implement such a thing themselves.

The key to making portability possible is the datastore. Of all the APIs App Engine provides, it's by far the most heavily used, the most critical, and the most complex. Further, while the stubs provided in the SDK will suffice for many APIs - for example, the Images API stub is quite satisfactory - this isn't the case for the datastore, which is implemented entirely in-memory and without indexes in the dev_appserver.

There are already efforts underway to help portability, of course. For example, Jens Scheffler is working on gae-sqlite, an sqlite backend for the App Engine datastore. A team at UCSB is working on AppScale, an entire alternate runtime for App Engine that is intended to run on Xen clusters, Amazon EC2, or its open-source clone, Eucalyptus. MongoDB has an App Engine interface in development. AppDrop will host your app on their hardware, by using a slightly modified version of the SDK. Unfortunately, none of these efforts are at the point where you could move an App Engine app onto them and expect it to survive in a production environment.

Into this environment, I'd like to present my own effort, BDBDatastore. BDBDatastore is a replacement datastore backend for App Engine. It's implemented in Java, and runs as a standalone server, communicating with apps over a TCP socket. Data storage is using BDB-JE, which is a native Java reimplementation of Oracle's Berkeley DB database system.

BDBDatastore is intended to be a feature-complete replacement for the native App Engine datastore. It will implement all of the same interfaces, in the same way. It ought to be possible to simply start your app up with BDBDatastore as the backend and have it function identically to the dev_appserver or the production App Engine environment. It's not quite there yet, however. So far implemented is:
  • get(), put() and delete() operations.
  • Transactions.
  • Entity and Ancestor queries.
  • Queries on a single property.
  • Merge-join queries.
  • count() for all supported query types.
  • A simple Python stub to allow Python App Engine apps to communicate with the datastore.
What's not implemented yet:
  • Custom indexes (and queries that require them).
  • Error handling for the Python stub.
  • A Java stub.
  • Support for __key__ queries.
  • A patch for dev_appserver so you can easily use bdbdatastore as the datastore backend.
  • Release packages and installation instructions.
For more details, see the status and todo pages in the wiki on github.

All of these, however, should be implemented in the near future, starting with the all important support for custom indexes.

What BDBDatastore is not intended to be is a one-size fits all datastore backend. The choice of architecture means that in the short term, at least, BDBDatastore is limited to at most a single backend per app, which means that if your app is really big, BDBDatastore is not the backend for you. This is somewhat deliberate, though: BDBDatastore will be much simpler to install, configure, and maintain than one of the alternatives like HBase or HyperTable, and thus for the very large number of apps that aren't big enough to require them, will be a very attractive option. Further, an easy migration path from BDBDatastore to other backends will be provided as soon as both products are mature, so if your app does get big, you can move to one of the higher-overhead but higher-scalability alternatives in a relatively pain-free manner.

Right now, BDBDatastore isn't ready to release, so if you're looking for something you can simply download and run right now, you'll be disappointed. Don't go away, though - that's coming in the very near future. If you're a developer, though, hopefully you'll find something of interest. And if you're really interested, it's all Open-Source - fork it, or join the project! Your contributions are welcome.

Finally, I'm hoping BDBDatastore will be only the start. I'd like to see a community form up around putting together a robust and mature alternate platform for App Engine apps, beginning with individual components such as datastore implementations and interfaces, containers for running App Engine apps in Apache and other web servers, and implementations of other stubs like memcache, but ending with a complete system you can fire up, point appcfg.py at (or the equivalent for Java), and 'deploy' to just the same as you would on the official runtime. With this, hopefully we can see App Engine's approach to webapps become more pervasive, and even more accessible.

Mandatory disclaimer: I work for Google, and as of recently, I'm a Developer Programs Engineer for Google App Engine. All of my opinions here are my own, of course, not Google's, and BDBDatastore is developed in my own time, with my own resources, completely independently of my work at Google.

Comments

blog comments powered by Disqus