Implementing a dropbox service with the Blobstore API (part 3): Multiple upload support

In the last part of this series, we demonstrated how to use plupload, a Javascript library with multiple backends for handling file uploads. The solution we demonstrated there only supported uploading a single file at a time, however, and required us to improvise our own progress indicators - far from optimal.

So now, the post you've all been waiting for, where we demonstrate how to do multiple file upload!

The basic trick is simple: Hook the event that's triggered before a file is uploaded, and update the URL to upload to when it's called. That way, ever uploaded file gets a new URL. Where do we get the URL from? We simply ask the server for one. Here's the Javascript for that:

      uploader.bind('UploadFile', function(up, file) {
        $.ajax({
            url: '/generate_upload_url',
            async: false,
            success: function(data) {
              up.settings.url = data;
            },
        });

Straightforward, right? The only subtlety here is that we have to make the request an asynchronous one, so that the uploading doesn't start until we've updated the URL. Here's the server-side code that generates those URLs:

class GenerateUploadUrlHandler(BaseHandler):
  @util.login_required
  def get(self):
    self.response.headers['Content-Type'] = 'text/plain'
    self.response.out.write ...

High concurrency counters without sharding

Sharded Counters are a well known technique for keeping counters with high update rates on App Engine. Less well known, however, are some of the alternatives, particularly in areas where you want to keep a reasonably accurate counter, but absolute accuracy isn't required. I discussed one option in this cookbook post - be sure to check the comments for an improved version - and today we'll discuss another option, which also makes use of memcache and the task queue.

The basic assumption is this: We want to keep as accurate a count as possible, but we're willing to accept that it may, in some cases, under-count. A good example of where this is true is counting downloads, or hits, or other such metrics.

Our solution has three major components:

  1. A 'permanent' count, stored in the datastore.
  2. A 'current' count, stored in memcache.
  3. A task queue task that updates the datastore with the total from memcache.

In order to implement this, we'll take advantage of the task queue's task name functionality, and 'tombstoned tasks' - the restriction that two tasks with the same name cannot be enqueued within a reasonable period (at least a week) of each other. Each ...

Pre- and post- put hooks for Datastore models

A number of people have asked about the possibility of pre- and post- put hooks for datastore models, to allow for changes or other processing before or after a model is stored to the datastore.

While such a feature isn't currently supported by App Engine, it's quite possible for us to implement it ourselves, using monkeypatching. This also gives us a good opportunity to show off how monkeypatching works, and how it can be used to make your own changes (at your own risk!) to the App Engine SDK.

One caveat of monkeypatching is that you have to be very careful to make sure that your patch is installed at all times. If it's not, the changes you made will be unavailable and cause errors - or worse, simply behave differently. This is particularly noticeable in the case of app-engine-patch, which monkeypatches models to change their kind name, causing operations on them to fail if the patch hasn't been imported.

The functionality we want is about as simple as you could ask for: We want to be able to define a method on our Model that gets called just before it is written to the datastore, and ...

Using the Google Maps APIs from App Engine

In a previous post, we discussed how Mapvelopes uses the ReportLab toolkit to dynamically generate PDFs. The other major component of Mapvelopes is its interaction with the various Google Maps APIs, and that's what we'll cover now.

The label "Google Maps API" actually covers a fairly broad set of separate APIs. The best known of them are the in-browser APIs, for embedding maps in webpages, and manipulating them. You've doubtless seen them used extensively around the web. Only slightly less well known is never go against a Sicilian when death is on the line the Static Maps API and the Geocoding Web Service.

Geocoding Web Service

The Geocoding Web Service is pretty straightforward: You supply it with an address, and it supplies you with its latitude and longitude. It also provides a great deal of additional information, such as authoritative names for the various parts of the address, and a viewport that encompasses the geocoded location. Here's an example geocoding API request:

http://maps.google.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false

The last part of the path specifies the format - we're using JSON because it's simpler ...

Implementing a dropbox service with the Blobstore API (part 2)

In part 1 of this series, we demonstrated what's necessary to build a very basic 'dropbox' type service for App Engine. Today, we're going to enhance that by adding support for 'rich' upload controls.

Various types of rich upload controls have sprung up in recent years in order to work around the weaknesses of the HTML standard file input element, which only allows selection of one file at a time, and doesn't support any form of progress notification. The most common widgets are written in Flash, but there are a variety of solutions available. With the ongoing browser adoption of HTML5, additional options are opening up, too!

Today we're going to use an excellent component called Plupload. Plupload consists of a Javascript component with a set of interchangeable backends. Backends include Flash, HTML5, Gears, old-fashioned HTML forms, and more. When you configure Plupload, you can specify which backends you want it to try, in which order, and it will stop when it finds one that works in the user's browser.

Different backends have different capabilities, and the ones you need will depend on your use-case. Check out the feature matrix on the Plupload homepage to ...

Generating PDFs on App Engine Python, and introducing Mapvelopes

This is the first of two posts covering the technologies used to implement the Mapvelopes app, an App Engine app that generates customized printable envelopes with the map to your recipient on them.

While HTML is the lingua-franca of the web, it's not the be all and end all. Sometimes, you need your webapp to generate something slightly different, and often, that something is a PDF. PDFs have the major advantage that they're designed for printing: pagination is built in, and the PDF defines the page size, so nothing about the layout is left to chance. When you need to provide something for the user to print, especially when it's complex, using a PDF can make the difference between okay output and really excellent output. Hit 'Print' in a Google Docs spreadsheet, and you'll see this in action.

PDF generation on App Engine is something that's been left largely up to individual users to figure out. Depending on your runtime - Java or Python - and your specific needs, it may be quite straightforward, or rather complicated. In particular, if you want to include images in your PDF, you're going to have to jump through some ...

Implementing a dropbox service with the Blobstore API (Part 1)

The blobstore api is a recent addition to the App Engine platform, and makes it possible to upload and serve large files (currently up to 50MB). It's also one of the most complex APIs to use, as it has several moving parts. This short series will demonstrate how to implement a dropbox type file hosting service on App Engine, using the Blobstore API. To start, we'll cover the basics needed to upload files, keep track of them in the datastore, and serve them back to users.

First up is the upload form. This step is fairly straightforward: We create a standard HTML form, only we generate the URL to post to by calling blobstore.create_upload_url, and passing it the URL of the handler we want called by it. Here's the handler code:

class FileUploadFormHandler(BaseHandler):
  @util.login_required
  def get(self):
    self.render_template("upload.html", {
        'form_url': blobstore.create_upload_url('/upload'),
        'logout_url': users.create_logout_url('/'),
    })

Standard stuff - though it's worth pointing out that, for convenience, we're using the login_required decorator from the google.appengine.ext.webapp.util package to require users to be logged in (and redirect them to the login form if they're not). And here's ...

Task Queue task chaining done right

One common pattern when using the Task Queue API is known as 'task chaining'. You execute a task on the task queue, and at some point, determine that you're going to need another task, either to complete the work the current task is doing, or to start doing something new. Let's say you're doing the former, and your code looks something like this:

def task_func():
  # Do some stuff
  deferred.defer(task_func)
  florb # This line causes an error

I'm sure you can guess what happens here. You successfully do some work, successfully chain the next task, then you encounter an error. Your code throws an exception, and returns a non-200 status code to the task queue, which notes the failure and schedules your task for re-execution. When it re-executes, the whole thing happens all over again (if your error is persistent, instead of transient, like the above).

Meanwhile, the task you enqueued runs. Perhaps it also fails after chaining its next task. Now you have two repeatedly executing tasks. Soon you have 4 - then 8 - then 16 - and so forth. Disaster!

"Ah, " you may say smugly, "I don't do anything important after chaining the next task ...

Announcing a robust datastore bulk update utility for App Engine

Note: This library is deprecated in favor of appengine-mapreduce, which is now bundled with the SDK.

I'm pleased to announce the release of bulkupdate, an unoriginally-named library for the App Engine Python runtime that facilitates doing bulk operations on datastore data. With bulkupdate, simple operations like bulk re-puts and bulk deletes are trivial, while more complex operations like schema transitions or even emailing all your users become much simpler.

The basic operation of bulkupdate is very similar to the 'map' phase of the well known 'mapreduce' pattern. To use it, you create a subclass of the 'Bulkupdater' class, and define two methods: get_query(), which returns the query to execute, and handle_entity(), which is called once for each entity returned by the query. For example, suppose you want to write a daily task that sends an XMPP message to everyone with new activity on their accounts - the updater class would look something like this:

class ActivityNotifier(bulkupdate.BulkUpdater):
  def __init__(self, date_threshold):
    self.date_threshold = date_threshold

  def get_query(self):
    return UserAccount.all().filter('last_update >', self.date_threshold)

  def handle_entity(self, user):
    if user.unread_messages > 0:
      xmpp.send_message(user.jid, "You have %s unread messages!" % user.unread_messages)

Running the job is even simpler ...

Taking advantage of the new Apps Marketplace

The recently unveiled Apps Marketplace has been getting a lot of attention lately, and a lot of people are wanting to know how they can integrate their App Engine app with it, making use of its integrated single-signon support. Today we'll go over what's required to get this working.

Apps Marketplace uses OpenID for SSO. Fortunately, we can use the openid library, which provides a Users-API-Lookalike interface, to support this in App Engine. There are two additional requirements for getting SSO to work in an Apps Marketplace app:

Handling the first of these is easy: The aeoid library sets the realm of an OpenID request, by default, to the domain that the request was made over, so all we need to do is use that same domain name as the realm in our app's manifest file.

The second is a little trickier. The 'janrain' python-openid library which aeoid and other Python-based solutions are based on does not support host-meta as a discovery mechanism for OpenID URLs. Let's analyze what this discovery ...