Storage options on App Engine

Posted by Nick Johnson | Filed under app-engine, memcache, storage, datastore, task-queue

App Engine provides a number of ways for your app to store data. Some, such as the datastore, are well known, but others are less so, and all of them have different characteristics. This article is intended to enumerate the different options, and describe the pros and cons of each, so you can make more informed decisions about how to store your data.

Datastore

The best known, most widely used, and most versatile storage option is, of course, the datastore. The datastore is App Engine's non-relational database, and it provides robust, durable storage, as well as providing the most flexibility in how your data is stored, retrieved, and manipulated.

Pros

Durable - data stored in the datastore is permanent.
Read-write - apps can both read and write datastore data, and the datastore provides transaction mechanisms to enforce integrity.
Globally consistent - all instances of an app have the same view of the datastore.
Flexible - queries and indexing provide many ways to query and retrieve data.

Cons

Latency - because the datastore stores data on disk and provides reliability guarantees, writes need to wait until data is confirmed to be stored before returning, and reads often have to fetch data from disk.

Memcache

Memcache is the best known of the 'secondary' storage mechanisms. The memcache API provides a means for applications to optimistically cache data to avoid redoing expensive operations. Memcache is often used as a caching layer for other APIs, such as the datastore, or to cache generated results from any source.

Pros

Fast - memcache accesses typically take only a few milliseconds to complete.
Globally consistent - all instances of an app have the same view of memcache. Memcache provides atomic operations so applications can ensure the integrity of data stored in it.

Cons

Unreliable - data may be evicted from memcache at any time.

Blobstore

The blobstore offers a way to store and serve large amounts of user-uploaded data easily and efficiently.

Pros

Supports large files - up to 2GB per blob.
Removes the need for you to handle blobs yourself.
Provides mechanism for high-performance serving of blobs, particularly images.
Applications can read blob contents as they would local files.

Cons

Read-only - applications cannot modify uploaded blobs, or create new ones.
Billing must be enabled to use the blobstore.

Instance memory

Application instances may also cache data in local memory, through the use of globals or class members. This provides the ultimate in speed, but comes with several downsides.

Pros

Fast - literally as fast as it's possible to be, since data is stored in the same process that is accessing it.
Convenient - no API required, just store data in globals or class members.
Flexible - data can be stored in any format your program can manipulate. No serialization or deserialization is required.

Cons

Unreliable - instances can be started or stopped at any time, so applications should only use it to cache data.
Not globally consistent - each instance of your app has its own runtime environment, and hence its own local variables. Changes in one instance are not reflected in other instances.
Limited capacity - instances are limited in how much memory they can consume before they are terminated. This puts a hard limit on how much data you can cache in memory.

Local files

Applications may read from any file that was uploaded with the application and not marked as static content, using standard filesystem operations. This includes read-only datasets that the application may need.

Pros

Fast - reading local files requires only standard disk access on the machine the application instance is running on, so latency is almost as good as memcache.
Reliable - if your app is serving, your local files are always available
Flexible - you can use any format or mechanism for accessing local files that you wish.

Cons

Read-only - applications may not modify the contents of local files; they are fixed at deployment time.
Limited capacity - applications are limited to 10MB per file, and 150MB in total for the application.

Task queue payloads

While not storage in the traditional sense, task queue tasks can have payloads attached, which can obviate the need to use other storage systems.

Pros

Fast - payloads are sent to the task when it's run, so no additional API calls are required to fetch the data.
Used properly, allows you to avoid the need to store task data elsewhere.

Cons

Single-purpose - payloads are only useful as storage for data being provided to a task queue task.
Limited capacity - tasks are limited to 10KB in size, including their payload data.

Conclusion

App Engine provides more data storage mechanisms than is apparrent at first glance. All of them have different tradeoffs, so it's likely that one - or more - of them will suit your application well. Often, the ideal solution involves a combination, such as the datastore and memcache, or local files and instance memory.

03 November, 2010

Previous Post Next Post

Nick's Blog