Handling file uploads in App Engine
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the ninth in a series of 'Cookbook' posts describing useful strategies and functionality for writing better App Engine applications.One issue that comes up frequently on the App Engine groups is how to handle file uploads from users. Part of the confusion arises from the fact that App Engine apps cannot write to the local filesystem. Fortunately, the Datastore comes to the rescue: We can easily write an app that accepts file uploads and stores them in the datastore, then serves them back to users.
To start, we need an HTML form users can upload files through:
<html>
<head>
<title>File Upload</title>
</head>
<body>
<form method="post" action="/">
<input type="file" name="file" />
<input type="submit" value="Upload" />
</form>
</body>
</html>
We'll also need a datastore model we can store the uploaded file in:
class DatastoreFile(db.Model):
data = db.BlobProperty(required=True)
mimetype = db.StringProperty(required=True)
And finally, a Request Handler that can handle the uploads:
class UploadHandler(webapp.RequestHandler):
def get(self):
self.response.out.write(template.render("upload.html", {}))
def post(self):
file = self.request.POST['file']
entity = DatastoreFile(data=file.value, mimetype=file.type)
entity.put()
file_url = "http://%s/%d/%s ...
Custom Datastore Properties 1: DerivedProperty
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
Advanced Bulk Loading Part 5: Bulk Loading for Java
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the seventh in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.When it comes to Bulk Loading, there's currently a shortage of tools for Java users, largely due to the relative newness of the Java platform for App Engine. Not all is doom and gloom, however: Users of App Engine for Java can use the Python bulkloader to load data into their Java App Engine instances! Because App Engine treats each major version of an app as a completely separate entity - but sharing the same datastore - it's possible to upload a Python version specifically for the purpose of Bulk Loading. This won't interfere with serving your Java app to your users.
To follow this guide, you won't need to understand much of the Python platform, though you will need to know a little Python. If your bulkloading needs are straightforward, you won't need to know much at all - it's essentially connect-the-dots - but if your bulkloading needs are a little more complex, you'll need to understand some of the basics of programming in Python - defining and calling functions and methods, basically. Who knows, you ...
Advanced Bulk Loading, part 4: Bulk Exporting
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
return converter
Advanced Bulk Loading, part 3: Alternate datasources
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the fifth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
The bulkloader automatically supports loading data from CSV files, but it's not restricted to that. With a little work, you can make the bulkloader load data from any datasource you can access with Python. The key to this is the generate_records method. This method accepts a filename, and is expected to yield a list of strings for each entity to be uploaded. By overriding this method, we can load from any datasource we wish - for example, a relational database such as MySQL. To make this reusable, let's define a base class we can use to load data from MySQL databases:
import MySQLdb
class MySQLLoader(bulkloader.Loader):
def __init__(self, kind_name, query, converters):
self.query = query
bulkloader.Loader.__init__(kind_name, converters)
def initialize(self, filename, loader_opts)
self.connect_args = dict(urlparse.parse_qsl(loader_opts))
def generate_records(self, filename):
"""Generates records from a MySQL database."""
db = MySQLdb.connect(self.connect_args)
cursor = db.cursor()
cursor.execute(self.query)
return iter(cursor.fetchone, None)
What we've done here is extended the bulkloader's Loader class to load from a MySQL database instead ...
Distributed Transactions on App Engine
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the fourth in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
As promised, today we're going to discuss Distributed Transactions on App Engine. Distributed transactions are that feature that you didn't know you needed until they were gone: The ability to update any set of entities or records in your database in a single transaction. Due to the requirements of scale, App Engine supports transactions, but only on predefined Entity Groups. For most cases, this is all you need, but occasionally you really have to use a distributed or global transaction.
There are proposals for implementing distributed transactions on App Engine, but they're complicated, and none of them have yet been implemented as a robust library you can simply plug in and use. We're not going to attempt to recreate a general purpose distributed transaction system - at least, not today - instead, we'll address one common use-case for distributed transactions.
The usual example of the need for distributed or global transactions - so common that it's practically canonical - is the 'bank account' example. Suppose you have a set of accounts, defined something like this:
class Account ...
Advanced Bulk Loading, part 2: Customization
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the third in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
Customizing Created Entities
Sometimes, it can be useful to modify an entity after it's created by the bulkloader, but before it's uploaded to App Engine. In our file upload example above, for instance, we might want to set the filesize field to the length of the uploaded file, but we don't want to add a new field to the data just to indicate that, since it's extra work, and could rapidly become inaccurate as files change.
Fortunately, the bulkloader provides an easy mechanism for this: The handle_entity method. By overriding this method, we can perform whatever postprocessing we wish on an entity before it's uploaded to the datastore. Here's an example that sets the filesize field:
class ImageLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'DatastoreImage',
[('filename', str),
('data', file_loader)
])
def handle_entity(self, entity):
entity.filesize = len(entity.data)
return entity
As an added bonus, the handle_entity method is not restricted to returning a single entity: It may return a list of them. You can use this to generate additional 'computed ...
Advanced Bulk Loading, part 1: Converters
Posted by Nick Johnson | Filed under app-engine, cookbook, coding
This is the second in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
The bulk loader facilitates getting data in and out of App Engine, but many people don't realise just how powerful it can be. In this and subsequent posts, we'll explore some of the more advanced things you can do with the bulk loader, including:
- Importing or exporting binary data
- Customizing created entities
- Loading from and exporting to relational databases
Custom Conversion Functions
The most straightforward way of using the bulkloader, as shown in the documentation, is to define a bulkloader.Loader subclass, and overload the __init__ function, supplying a list of converters for the fields in your input file. Here's the example from the documentation:
class AlbumLoader(bulkloader.Loader):
def __init__(self):
bulkloader.Loader.__init__(self, 'Album',
[('title', str),
('artist', str),
('publication_date', lambda x: datetime.datetime.strptime(x, '%m/%d/%Y').date()),
('length_in_minutes', int)
])
Most of the converters look like declarations - title is a str(ing), as is artist; length_in_minutes is an int. Publication_date is an odd one out, though, and gives us a hint of the real power behind converters: They can be any function ...
Efficient model memcaching
Posted by Nick Johnson | Filed under tech, app-engine, cookbook, coding
This is the first in a series of 'cookbook' posts describing useful strategies and functionality for writing better App Engine applications.
One common pattern in App Engine is using memcache to store entities retrieved from the datastore - either individual ones or a list of them - to avoid re-executing common queries. The natural pattern is something like this:
from google.appengine.api import memcache
from google.appengine.ext import db
entities = memcache.get("somekey")
if not entities:
entities = MyModel.all().fetch(10)
memcache.set("somekey", entities)
This has several problems:
- Pickling in general is slow. App Engine doesn't have the more efficient cPickle module, so pickling on App Engine is sloooooow.
- Changes in the Model class between versions risk causing errors when unpickling entities that were created before the update.
- The Model class often contains a cached copy of the internal data representation, so often you're storing everything twice in the pickled data.
Previously, there wasn't really a way around this. You could store some processed form of your data, but this is awkward and creates extra work. Fortunately, a minor change in 1.2.5 makes efficient memcaching now easy.
Two new functions were added to the ...
Newer