Handling file uploads in App Engine

This is the ninth in a series of 'Cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

One issue that comes up frequently on the App Engine groups is how to handle file uploads from users. Part of the confusion arises from the fact that App Engine apps cannot write to the local filesystem. Fortunately, the Datastore comes to the rescue: We can easily write an app that accepts file uploads and stores them in the datastore, then serves them back to users.

To start, we need an HTML form users can upload files through:

<html>
<head>
  <title>File Upload</title>
</head>
<body>
  <form method="post" action="/">
    <input type="file" name="file" />
    <input type="submit" value="Upload" />
  </form>
</body>
</html>

We'll also need a datastore model we can store the uploaded file in:

class DatastoreFile(db.Model):
  data = db.BlobProperty(required=True)
  mimetype = db.StringProperty(required=True)

And finally, a Request Handler that can handle the uploads:

class UploadHandler(webapp.RequestHandler):
  def get(self):
    self.response.out.write(template.render("upload.html", {}))

  def post(self):
    file = self.request.POST['file']
    entity = DatastoreFile(data=file.value, mimetype=file.type)
    entity.put()
    file_url = "http://%s/%d/%s ...

Custom Datastore Properties 1: DerivedProperty

This is the eighth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

The idea of extended Model properties has been covered a couple of times in the past, but it's a topic that I think is worth coming back to: There's a lot you can do with the datastore by writing your own Property subclass. To illustrate, I'd like to work through a salient and widely applicable example.

A common requirement when using the Datastore is storing some form of calculated property - for example, the lower-cased version of a text string, so it can be filtered on, or the length of a list, or the sum of some elements. One can do this manually, but it's easy to forget to update the computed property in some places. Other solutions include overriding the put() method, but this doesn't get updated if you store your entity using db.put(). Given the post this is appearing in, I'm sure you can figure out what the solution is going to be: a custom property class!

What we want here is a DerivedProperty. In order to be as flexible as ...

Advanced Bulk Loading Part 5: Bulk Loading for Java

This is the seventh in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

When it comes to Bulk Loading, there's currently a shortage of tools for Java users, largely due to the relative newness of the Java platform for App Engine. Not all is doom and gloom, however: Users of App Engine for Java can use the Python bulkloader to load data into their Java App Engine instances! Because App Engine treats each major version of an app as a completely separate entity - but sharing the same datastore - it's possible to upload a Python version specifically for the purpose of Bulk Loading. This won't interfere with serving your Java app to your users.

To follow this guide, you won't need to understand much of the Python platform, though you will need to know a little Python. If your bulkloading needs are straightforward, you won't need to know much at all - it's essentially connect-the-dots - but if your bulkloading needs are a little more complex, you'll need to understand some of the basics of programming in Python - defining and calling functions and methods, basically. Who knows, you ...

Advanced Bulk Loading, part 4: Bulk Exporting

This is the sixth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

In previous posts, we covered Advanced Bulk-loading. Now we'll cover, briefly, the reverse: advanced use of the Bulk Exporter functionality. Unfortunately, the Bulk Exporter is currently much more limited than the Bulk-Loader - though it's also much less mature, so we can expect that to change - but there are still customizations you can apply.

The simplest one is the same as what we covered in part 1 of our Bulk-loader series: Custom conversion functions. Bulk exporter classes define a list of fields and conversion functions just like the importer; the difference is that these functions are expected to convert to strings, rather than from strings. Let's start by going over the equivalent to the two loader conversion functions we defined. First, one to format dates:

def export_date(fmt):
  """Returns a converter function that outputs the supplied date-time format."""
  def converter(d):
    return d.strftime(fmt)
  return converter

So far so good. We use it the same way we used the converter function in the importer:

class AlbumExporter(bulkloader.Exporter):
  def __init__(self):
    bulkloader.Exporter.__init__(self, 'Album ...

Advanced Bulk Loading, part 3: Alternate datasources

This is the fifth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

The bulkloader automatically supports loading data from CSV files, but it's not restricted to that. With a little work, you can make the bulkloader load data from any datasource you can access with Python. The key to this is the generate_records method. This method accepts a filename, and is expected to yield a list of strings for each entity to be uploaded. By overriding this method, we can load from any datasource we wish - for example, a relational database such as MySQL. To make this reusable, let's define a base class we can use to load data from MySQL databases:

import MySQLdb class MySQLLoader(bulkloader.Loader): def __init__(self, kind_name, query, converters): self.query = query bulkloader.Loader.__init__(kind_name, converters) def initialize(self, filename, loader_opts) self.connect_args = dict(urlparse.parse_qsl(loader_opts)) def generate_records(self, filename): """Generates records from a MySQL database.""" db = MySQLdb.connect(self.connect_args) cursor = db.cursor() cursor.execute(self.query) return iter(cursor.fetchone, None)

What we've done here is extended the bulkloader's Loader class to load from a MySQL database instead ...

Distributed Transactions on App Engine

This is the fourth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

As promised, today we're going to discuss Distributed Transactions on App Engine. Distributed transactions are that feature that you didn't know you needed until they were gone: The ability to update any set of entities or records in your database in a single transaction. Due to the requirements of scale, App Engine supports transactions, but only on predefined Entity Groups. For most cases, this is all you need, but occasionally you really have to use a distributed or global transaction.

There are proposals for implementing distributed transactions on App Engine, but they're complicated, and none of them have yet been implemented as a robust library you can simply plug in and use. We're not going to attempt to recreate a general purpose distributed transaction system - at least, not today - instead, we'll address one common use-case for distributed transactions.

The usual example of the need for distributed or global transactions - so common that it's practically canonical - is the 'bank account' example. Suppose you have a set of accounts, defined something like this:

class Account ...

Advanced Bulk Loading, part 2: Customization

This is the third in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

Customizing Created Entities

Sometimes, it can be useful to modify an entity after it's created by the bulkloader, but before it's uploaded to App Engine. In our file upload example above, for instance, we might want to set the filesize field to the length of the uploaded file, but we don't want to add a new field to the data just to indicate that, since it's extra work, and could rapidly become inaccurate as files change.

Fortunately, the bulkloader provides an easy mechanism for this: The handle_entity method. By overriding this method, we can perform whatever postprocessing we wish on an entity before it's uploaded to the datastore. Here's an example that sets the filesize field:

class ImageLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'DatastoreImage', [('filename', str), ('data', file_loader) ]) def handle_entity(self, entity): entity.filesize = len(entity.data) return entity

As an added bonus, the handle_entity method is not restricted to returning a single entity: It may return a list of them. You can use this to generate additional 'computed ...

Advanced Bulk Loading, part 1: Converters

This is the second in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

The bulk loader facilitates getting data in and out of App Engine, but many people don't realise just how powerful it can be. In this and subsequent posts, we'll explore some of the more advanced things you can do with the bulk loader, including:

  • Importing or exporting binary data
  • Customizing created entities
  • Loading from and exporting to relational databases

Custom Conversion Functions

The most straightforward way of using the bulkloader, as shown in the documentation, is to define a bulkloader.Loader subclass, and overload the __init__ function, supplying a list of converters for the fields in your input file. Here's the example from the documentation:

class AlbumLoader(bulkloader.Loader): def __init__(self): bulkloader.Loader.__init__(self, 'Album', [('title', str), ('artist', str), ('publication_date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date()), ('length_in_minutes', int) ])

Most of the converters look like declarations - title is a str(ing), as is artist; length_in_minutes is an int. Publication_date is an odd one out, though, and gives us a hint of the real power behind converters: They can be any function ...

Efficient model memcaching

This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.

One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:

from google.appengine.api import memcache from google.appengine.ext import db entities = memcache.get("somekey") if not entities: entities = MyModel.all().fetch(10) memcache.set("somekey", entities)

This has several problems:

  • Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
  • Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
  • The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.

Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.

Two new functions were added to the ...