Damn Cool Algorithms: Levenshtein Automata

In a previous Damn Cool Algorithms post, I talked about BK-trees, a clever indexing structure that makes it possible to search for fuzzy matches on a text string based on Levenshtein distance - or any other metric that obeys the triangle inequality. Today, I'm going to describe an alternative approach, which makes it possible to do fuzzy text search in a regular index: Levenshtein automata.


The basic insight behind Levenshtein automata is that it's possible to construct a Finite state automaton that recognizes exactly the set of strings within a given Levenshtein distance of a target word. We can then feed in any word, and the automaton will accept or reject it based on whether the Levenshtein distance to the target word is at most the distance specified when we constructed the automaton. Further, due to the nature of FSAs, it will do so in O(n) time with the length of the string being tested. Compare this to the standard Dynamic Programming Levenshtein algorithm, which takes O(mn) time, where m and n are the lengths of the two input words! It's thus immediately apparrent that Levenshtein automaton provide, at a minimum, a faster way for ...

The wonders of HDMI

Recently, we embarked upon an update of our 'media center' / 'home theater' setup. Our projector had reached the end of its 2000 hour bulb life, and since it was a cheap projector to start, a replacement bulb would've cost nearly as much as the projector itself did a couple of years ago. Also, it's become extremely unreliable - 9 times out of 10, pressing the power button results in a brief flash of light from the bulb, followed by nothing. Plus, we promised ourselves that when the bulb ran out, we'd buy an HD projector, as they ought to be affordable by then.

Well, that time has come, and we've done the upgrade. Since we were upgrading to HD, it seemed necessary to get an AV receiver that supports HDMI to replace our 2-channel amplifier and let us switch audio and video at the same time. And if we have that, we may as well get a blu-ray player, too - after all, they're pretty cheap now.

Here's our new setup:

Going by the letters, we have:

  1. Our HTPC media box (Existing)
  2. Netgear XAVB1004 powerline networking switch (Existing)
  3. Philips BD3000/05 Blu-ray player (New)
  4. Yamaha RX-V367 ...

App Engine Cookbook: On-demand Cron Jobs

Today's post is, by necessity, a brief one. I'm travelling to San Francisco for I/O at the moment, and my flight was delayed so much I missed my connection in Atlanta and had to stay the night; in fact, I'm writing and posting this from the plane, using the onboard WiFi!

In a previous post, I introduced a recipe for high concurrency counters, which used a technique that I believe deserves its own post, since it's a useful pattern on its own. That technique is what I'm calling "On-demand Cron Jobs"

It's not at all uncommon for apps to have a need to do periodic updates at intervals, where the individual updates are small, and may even shift in time. One example is deleting or modifying any entry that hasn't been modified in the last day. In apps that need to do this, it's not uncommon to see a cron job like the following:

- description: Clean up old data
  url: /tasks/cleanup
  schedule: every 1 minute

This works, but it potentially consumes a significant amount of resources checking repeatedly if there's anything to clean up. Using the task queue ...

Building a censorship resistant publishing system

I've had an idea related to censorship resistant publishing kicking around in my head for some time now, and it seems like it's about time I got it written down somewhere, for consideration and criticism. Part of my motivation is that I'm intending to snatch some spare time while I'm on the plane to the US this weekend (to attend I/O) to have a go at implementing a basic version of it.

In a nutshell, I have a design for what I believe would be a fairly robust censorship resistant publishing system, based on a DHT, and integrating fully with the web. Content published using this system would be available in exactly the same fashion as a regular website, which strikes me as a major advantage over many other proposals for similar systems.

The system consists of several layers, which I'll tackle in order:

  1. Document storage and retrieval
  2. Name resolution for documents within a 'site'
  3. External access
  4. Name resolution for sites

Document storage and retrieval

The lowest level of the system is also the simplest: That of storing and retrieving documents. This layer of the system acts more or less exactly like a regular ...

Authenticating against App Engine from an Android app

Many an Android app requires a server backend of some sort, and what better choice than App Engine? It's free, reliable, and does everything you're likely to need in a backend. It has one other major advantage, too: It supports Google Account authentication, and nearly all Android users will already have a Google Account.

So given that we want a backend for our app, and given that we want to have user authentication, how do we go about this? We could prompt the user for their credentials, but that seems less than ideal: the Android device already has their credentials, and users may not trust us with them. Is there a way we can leverage an Android API to take care of authentication? It turns out there is.

Authentication with App Engine, regardless of where you're doing it, is a three-stage process:

  1. Obtain an authentication token. This can be done with ClientLogin for installed apps, for example, or with AuthSub for a webapp. When logging in directly to an application, this is the part of the login process where your user sees a Google signin screen.
  2. Take that authentication token, and use it to obtain an authentication ...

Using the new bulkloader

Recently, Matthew Blain, of the App Engine team, announced the prerelease of a new bulkloader. The new bulkloader uses yaml files for configuration, and takes a 'declarative' rather than procedural approach to configuration for downloading and uploading data. As a result, you don't have to understand Python in order to configure and use the new bulkloader, which is a significant advantage for users of the Java App Engine runtime.

There are, of course, many other significant improvements, including autogeneration of config files, a bulit in library of converters for common data types, support for input and output types other than CSV, and more. Today, we'll walk through basic usage of the new bulkloader, and demonstrate some of its features.

Configuration autogeneration

One of the most significant new features of the bulkloader is its support for autogenerating config files. It works like this: You point it at your production app, and it downloads the datastore stats, and uses them to generate a configuration file for you. You edit the configuration file to fill in a few missing fields and tidy it up, and presto, you have a working bulkloader configuration. Let's see how that works out when we ...

Games on App Engine

My interview with Jay Kiburz, author of Neptune's Pride is now up on the App Engine blog.

Using the Google Maps APIs from App Engine

In a previous post, we discussed how Mapvelopes uses the ReportLab toolkit to dynamically generate PDFs. The other major component of Mapvelopes is its interaction with the various Google Maps APIs, and that's what we'll cover now.

The label "Google Maps API" actually covers a fairly broad set of separate APIs. The best known of them are the in-browser APIs, for embedding maps in webpages, and manipulating them. You've doubtless seen them used extensively around the web. Only slightly less well known is never go against a Sicilian when death is on the line the Static Maps API and the Geocoding Web Service.

Geocoding Web Service

The Geocoding Web Service is pretty straightforward: You supply it with an address, and it supplies you with its latitude and longitude. It also provides a great deal of additional information, such as authoritative names for the various parts of the address, and a viewport that encompasses the geocoded location. Here's an example geocoding API request:


The last part of the path specifies the format - we're using JSON because it's simpler ...

Generating PDFs on App Engine Python, and introducing Mapvelopes

This is the first of two posts covering the technologies used to implement the Mapvelopes app, an App Engine app that generates customized printable envelopes with the map to your recipient on them.

While HTML is the lingua-franca of the web, it's not the be all and end all. Sometimes, you need your webapp to generate something slightly different, and often, that something is a PDF. PDFs have the major advantage that they're designed for printing: pagination is built in, and the PDF defines the page size, so nothing about the layout is left to chance. When you need to provide something for the user to print, especially when it's complex, using a PDF can make the difference between okay output and really excellent output. Hit 'Print' in a Google Docs spreadsheet, and you'll see this in action.

PDF generation on App Engine is something that's been left largely up to individual users to figure out. Depending on your runtime - Java or Python - and your specific needs, it may be quite straightforward, or rather complicated. In particular, if you want to include images in your PDF, you're going to have to jump through some ...

On the hazards of promising something you don't have

Astute readers may recall that on monday I promised that today's post would be about providing user feedback on file uploads. It turns out, however, that I'm too clever for my own good: I thought I knew exactly how to handle this, but an unexpected roadblock has made things... problematic.

Not to worry, though! I have a way forward, it'll just take me a little longer than expected to put it together for your consumption. Your patience is appreciated.

In the meantime, I'd like to direct you to my latest post on the official App Engine blog, Easy Performance Profiling with AppStats.