Using the new bulkloader

Recently, Matthew Blain, of the App Engine team, announced the prerelease of a new bulkloader. The new bulkloader uses yaml files for configuration, and takes a 'declarative' rather than procedural approach to configuration for downloading and uploading data. As a result, you don't have to understand Python in order to configure and use the new bulkloader, which is a significant advantage for users of the Java App Engine runtime.

There are, of course, many other significant improvements, including autogeneration of config files, a bulit in library of converters for common data types, support for input and output types other than CSV, and more. Today, we'll walk through basic usage of the new bulkloader, and demonstrate some of its features.

Configuration autogeneration

One of the most significant new features of the bulkloader is its support for autogenerating config files. It works like this: You point it at your production app, and it downloads the datastore stats, and uses them to generate a configuration file for you. You edit the configuration file to fill in a few missing fields and tidy it up, and presto, you have a working bulkloader configuration. Let's see how that works out when we ...

Implementing a dropbox service with the Blobstore API (part 3): Multiple upload support

In the last part of this series, we demonstrated how to use plupload, a Javascript library with multiple backends for handling file uploads. The solution we demonstrated there only supported uploading a single file at a time, however, and required us to improvise our own progress indicators - far from optimal.

So now, the post you've all been waiting for, where we demonstrate how to do multiple file upload!

The basic trick is simple: Hook the event that's triggered before a file is uploaded, and update the URL to upload to when it's called. That way, ever uploaded file gets a new URL. Where do we get the URL from? We simply ask the server for one. Here's the Javascript for that:

      uploader.bind('UploadFile', function(up, file) {
        $.ajax({
            url: '/generate_upload_url',
            async: false,
            success: function(data) {
              up.settings.url = data;
            },
        });

Straightforward, right? The only subtlety here is that we have to make the request an asynchronous one, so that the uploading doesn't start until we've updated the URL. Here's the server-side code that generates those URLs:

class GenerateUploadUrlHandler(BaseHandler):
  @util.login_required
  def get(self):
    self.response.headers['Content-Type'] = 'text/plain'
    self.response.out.write ...

High concurrency counters without sharding

Sharded Counters are a well known technique for keeping counters with high update rates on App Engine. Less well known, however, are some of the alternatives, particularly in areas where you want to keep a reasonably accurate counter, but absolute accuracy isn't required. I discussed one option in this cookbook post - be sure to check the comments for an improved version - and today we'll discuss another option, which also makes use of memcache and the task queue.

The basic assumption is this: We want to keep as accurate a count as possible, but we're willing to accept that it may, in some cases, under-count. A good example of where this is true is counting downloads, or hits, or other such metrics.

Our solution has three major components:

  1. A 'permanent' count, stored in the datastore.
  2. A 'current' count, stored in memcache.
  3. A task queue task that updates the datastore with the total from memcache.

In order to implement this, we'll take advantage of the task queue's task name functionality, and 'tombstoned tasks' - the restriction that two tasks with the same name cannot be enqueued within a reasonable period (at least a week) of each other. Each ...

Pre- and post- put hooks for Datastore models

A number of people have asked about the possibility of pre- and post- put hooks for datastore models, to allow for changes or other processing before or after a model is stored to the datastore.

While such a feature isn't currently supported by App Engine, it's quite possible for us to implement it ourselves, using monkeypatching. This also gives us a good opportunity to show off how monkeypatching works, and how it can be used to make your own changes (at your own risk!) to the App Engine SDK.

One caveat of monkeypatching is that you have to be very careful to make sure that your patch is installed at all times. If it's not, the changes you made will be unavailable and cause errors - or worse, simply behave differently. This is particularly noticeable in the case of app-engine-patch, which monkeypatches models to change their kind name, causing operations on them to fail if the patch hasn't been imported.

The functionality we want is about as simple as you could ask for: We want to be able to define a method on our Model that gets called just before it is written to the datastore, and ...

Using the Google Maps APIs from App Engine

In a previous post, we discussed how Mapvelopes uses the ReportLab toolkit to dynamically generate PDFs. The other major component of Mapvelopes is its interaction with the various Google Maps APIs, and that's what we'll cover now.

The label "Google Maps API" actually covers a fairly broad set of separate APIs. The best known of them are the in-browser APIs, for embedding maps in webpages, and manipulating them. You've doubtless seen them used extensively around the web. Only slightly less well known is never go against a Sicilian when death is on the line the Static Maps API and the Geocoding Web Service.

Geocoding Web Service

The Geocoding Web Service is pretty straightforward: You supply it with an address, and it supplies you with its latitude and longitude. It also provides a great deal of additional information, such as authoritative names for the various parts of the address, and a viewport that encompasses the geocoded location. Here's an example geocoding API request:

http://maps.google.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false

The last part of the path specifies the format - we're using JSON because it's simpler ...

Generating PDFs on App Engine Python, and introducing Mapvelopes

This is the first of two posts covering the technologies used to implement the Mapvelopes app, an App Engine app that generates customized printable envelopes with the map to your recipient on them.

While HTML is the lingua-franca of the web, it's not the be all and end all. Sometimes, you need your webapp to generate something slightly different, and often, that something is a PDF. PDFs have the major advantage that they're designed for printing: pagination is built in, and the PDF defines the page size, so nothing about the layout is left to chance. When you need to provide something for the user to print, especially when it's complex, using a PDF can make the difference between okay output and really excellent output. Hit 'Print' in a Google Docs spreadsheet, and you'll see this in action.

PDF generation on App Engine is something that's been left largely up to individual users to figure out. Depending on your runtime - Java or Python - and your specific needs, it may be quite straightforward, or rather complicated. In particular, if you want to include images in your PDF, you're going to have to jump through some ...

Announcing a robust datastore bulk update utility for App Engine

Note: This library is deprecated in favor of appengine-mapreduce, which is now bundled with the SDK.

I'm pleased to announce the release of bulkupdate, an unoriginally-named library for the App Engine Python runtime that facilitates doing bulk operations on datastore data. With bulkupdate, simple operations like bulk re-puts and bulk deletes are trivial, while more complex operations like schema transitions or even emailing all your users become much simpler.

The basic operation of bulkupdate is very similar to the 'map' phase of the well known 'mapreduce' pattern. To use it, you create a subclass of the 'Bulkupdater' class, and define two methods: get_query(), which returns the query to execute, and handle_entity(), which is called once for each entity returned by the query. For example, suppose you want to write a daily task that sends an XMPP message to everyone with new activity on their accounts - the updater class would look something like this:

class ActivityNotifier(bulkupdate.BulkUpdater):
  def __init__(self, date_threshold):
    self.date_threshold = date_threshold

  def get_query(self):
    return UserAccount.all().filter('last_update >', self.date_threshold)

  def handle_entity(self, user):
    if user.unread_messages > 0:
      xmpp.send_message(user.jid, "You have %s unread messages!" % user.unread_messages)

Running the job is even simpler ...

Taking advantage of the new Apps Marketplace

The recently unveiled Apps Marketplace has been getting a lot of attention lately, and a lot of people are wanting to know how they can integrate their App Engine app with it, making use of its integrated single-signon support. Today we'll go over what's required to get this working.

Apps Marketplace uses OpenID for SSO. Fortunately, we can use the openid library, which provides a Users-API-Lookalike interface, to support this in App Engine. There are two additional requirements for getting SSO to work in an Apps Marketplace app:

Handling the first of these is easy: The aeoid library sets the realm of an OpenID request, by default, to the domain that the request was made over, so all we need to do is use that same domain name as the realm in our app's manifest file.

The second is a little trickier. The 'janrain' python-openid library which aeoid and other Python-based solutions are based on does not support host-meta as a discovery mechanism for OpenID URLs. Let's analyze what this discovery ...

Interactive tables for fun and, er, fun.

Recently, I've been pondering, with some workmates, the practicality of putting together our own interactive table, similar to the Microsoft Surface or the reactable.

There are a number of variations on how to build one, but the one we're planning on trying seems to be the simplest: Build a custom table with a frosted glass or perspex top, and place a projector in the base, projecting onto the bottom of the frosted surface. Additionally, have a camera under the table, pointing at the surface, to detect touches and objects.

There are a number of variations on this theme. trackmate is a system of 2d barcodes and open source software that allows you to tag and track objects. Their example configurations involve a frosted plexiglass surface, with even illumination and a camera placed underneath. None of them directly support surfaces with images projected onto them, though.

This instructable demonstrates the construction of a multitouch table that supports both touch detection and a projector, through a technique called frustrated total internal reflection. It relies on a strip of infra-red LEDs along the edge of the panel, and touching the panel disrupts the internal reflection, allowing an infra-red camera under the ...

Using the ereporter module for easy error reporting in App Engine

One little known package in the google.appengine.ext package is ereporter. This package exists to make it easier to get summaries of errors generated by your Python App Engine app, and today we'll show you how.

Far too often for new webapps, error reports for live webapps are a catch-as-catch-can type practice, with reports coming in from dedicated users, and whenever you think to check the logs page of your app. A lot of bugs can slip through this way, however, with exceptions going unnoticed to everyone but the users who experience them, then walk away in disgust, never to return again. With ereporter, however, we'll demonstrate how to set up a simple handler that takes care of capturing all the exceptions that occur in your app, and emailing a daily report to you, summarizing what went wrong.

Installing ereporter consists of 3 stages: Modifying your handler script, modifying your app.yaml, and adding a cron job. Let's start by modifying your handler script(s). Add the following to the top of all your handler scripts (that is, scripts that are mentioned in app.yaml):

import logging
from google.appengine.ext import ereporter

ereporter.register_logger()

The ...