Nick's Blog

netboot.me: Turning 'netboot' into 'internetboot'

Posted by Nick Johnson | Filed under coding, app-engine, tech, netbootme

After a couple of weeks of concerted effort, I'm pleased to announce the initial release of a new service, netboot.me. Netboot.me takes regular netbooting and makes it a whole lot more versatile - now, you can netboot directly into the installers for many popular linux distros, as well as system tools and even live linux distributions, all directly over the Internet, and without any local configuration required!

All that's required to set up netboot.me is a spare writable CD, USB key, or floppy disk to write a small (less than 1MB) disk image to. Alternately, determined geeks can change their DHCP server to allow computers to netboot directly. Once you've done that, booting off the device on any computer with wired ethernet (wifi is a work in progress) will automatically cause the bootloader to download the current version of the menu from netboot.me, which you can then find the boot image you want to boot from. Selecting it causes the boot image to be downloaded and booted immediately.

Currently on the boot menu:

Installers for several popular linux distros (Ubuntu, Fedora, OpenSUSE, Debian).
The FreeBSD installer.
Tiny Core and Micro Core Linux ...

Ricochet Robots and interesting game trees

Posted by Nick Johnson | Filed under coding, games

Every Wednesday at work we have a Games Night - people bring along (non-computer) games, we get together, and we play a few. Last Wednesday, someone brought along Ricochet Robots. I wasn't very good at it, but the game intrigues me from a computational point of view.

Essentially, it's a puzzle game. You have a board divided into squares. Each square is either empty or has a target. A target consists of one of four colors and a symbol. Walls are (seemingly) randomly scattered around the board, blocking passage between adjacent squares. There are four robot tokens, in the same colors as the targets, which start off scattered around the board randomly.

There's also a set of tokens, one for each target. One of these is drawn randomly and placed face up in the center of the board. It specifies that the robot of that color must make its way to the target with that color and symbol. Robots move like they're on a skating rink - they can move in any of the 4 cardinal directions, but once moving continue until they hit a wall or another robot. Nobody owns a robot - all 4 robots may be ...

BDBDatastore 0.2 released

Posted by Nick Johnson | Filed under coding, app-engine, bdbdatastore, tech

I'm pleased to say that BDBDatastore 0.2 is now released. With this release, BDBDatastore is now officially at feature parity with the production App Engine datastore. That is, it ought to be able to do everything the production datastore can, which means you can port your apps off the production datastore without having to change them.

Installation instructions can be found here. The release is numbered 0.2, but if it proves stable enough, it will become the official 1.0 release of BDBDatastore. So treat it as beta at least until it's got a little more testing.

If you try it out, speak up! I'd like to hear what people think of this. In the meantime, I'm going to start working on writing a container for running App Engine apps on Apache and other HTTP servers, as well as doing load testing and profiling of BDBDatastore.

BDBDatastore 0.1 released

Posted by Nick Johnson | Filed under coding, app-engine, bdbdatastore, tech

When I announced BDBDatastore just a few days ago, it was still a ways away from being practically usable for anyone wanting to develop or deploy App Engine apps. The purpose of the post was twofold: To attract some initial interest, and to motivate me, with the light of public scrutiny, to make sure it gets finished and polished.

I'm pleased to say that release 0.1 is now available. Version 0.1 brings BDBDatastore to parity with the feature set the App Engine datastore had on release day - that is to say, fully featured except for __key__ queries. Along with the server itself, I've also provided a patch to the App Engine SDK that allows you to tell the Python dev_appserver to use BDBDatastore for backend storage.

Full installation and usage instructions can be found on the wiki. Note that this release is still very much beta. It shouldn't break, but it might (and if it does, please let me know). It's also possible (likely, even) that the datastore will change in backwards-incompatible ways between now and 1.0.

As always, feedback and comments are appreciated.

Announcing BDBDatastore, a replacement datastore for App Engine

Posted by Nick Johnson | Filed under coding, app-engine, bdbdatastore, tech

One criticism I frequently see directed at App Engine is that of lock-in. Since App Engine doesn't use the same APIs and libraries that people are used to using elsewhere, people say, Google is implicitly locking people in to continuing to run their App Engine apps on Google infrastructure.

I'm of two minds on this. On the one hand, I don't think it's justified to call this "lock-in" - Google has provided ample documentation of the runtime environment and the APIs available, and where documentation isn't available, the SDK source code is, so it's possible to figure out everything necessary to produce compatible interfaces with publicly available information alone, and without resorting to reverse engineering or any other gray areas.

On the other hand, while I don't think there's intentional lock-in, the lack of available alternatives amounts to practical lock-in. While moving your app off Google infrastructure would require implementing the new infrastructure yourself, this amounts to lock-in for the vast majority of people, who can't afford the time and resources required to implement such a thing themselves.

The key to making portability possible is the datastore. Of all the APIs App ...

Nearly all DHT implementations vulnerable to 'merge' bug.

Posted by Nick Johnson | Filed under coding, tech

As DHT implementations proliferate and harmonise, the prospect of multiple widely-deployed applications using the same or compatiable DHT implementations is increasingly becoming a reality. There are a large and increasing number of DHT libraries out there, such as Entangled , and FreePastry, being used by an increasing number of applications.

However, most of these implementations are vulnerable to a simple but subtle bug: They have no way of distinguishing one DHT network from another. Although each application's DHT network or networks start off separate, if, by chance or deliberate action a node from a different but compatiable DHT is introduced, the 'self healing' property of DHTs will ensure that, sooner or later, the two networks become merged into a single DHT.

This is not a problem for single-purpose DHT implementations such as those used by BitTorrent or Overnet, since they generally establish a single DHT with all compatiable clients participating in any case. Nor is this a problem for networks designed with heterogenous applications in mind, such as CSpace. However, this still leaves a number of DHT libraries that don't fall into either of these categories.

If DHTs from two distinct applications using compatiable implementations become merged, the outcome ...

Update on Anagram Trees

Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms

Original Post

One nice thing about working at Google is that you are surrounded by very smart people. I told one of my coworkers about the anagram tree idea, and he immediately pointed out that reordering the alphabet so that the least frequently used letters come first would reduce the branching factor early in the tree, which has the effect of reducing the overall size of the tree substantially. While this seems obvious in retrospect, it's kind of unintuitive - usually we try to _increase_ the branching factor of n-ary trees to make them shallower and require fewer operations, rather than trying to reduce it.

Trying it out with an ordering determined by looking at the branching factor for each letter produces results that bear this out: Memory is reduced by about a third, and the number of internal nodes is reduced to 858,858 from 1,874,748, a reduction of more than 50! Though I haven't benchmarked it, difficult lookups are substantially faster, too.

The next logical development to try is to re-evaluate the order of the alphabet on a branch-by-branch basis. While I doubt this will have a substantial impact, it seems worth a try, so ...

Damn Cool Algorithms, Part 3: Anagram Trees

Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms

I hesitate to call this algorithm "damn cool", since it's something I invented* it myself, but I think it _is_ rather cool, and it fits the theme of my algorithms posts, so here it is anyway.

When it comes to finding anagrams of words, a frequent approach is to use an anagram dictionary - simply put, sort the letters in your word to provide a unique key that all anagrams of a word have in common. Another approach is to generate a letter-frequency histogram for each letter in your word. (Both these approaches are more or less equivalent, in fact.) These approaches make the problem of finding exact single-word anagrams for strings very efficient - O(1) if you use a hashtable.

However, the problem of finding subset anagrams - a word that contains a subset of the letters in a string - is still rather inefficient, requiring either a brute force O(n) search through the dictionary, or looking up every substring of the sorted input string, which is O(2^l) with the number of letters in the input string. Finding subset anagrams is significantly more interesting, too, as it has applications in finding multi-word anagrams, as well as being applicable ...

Damn Cool Algorithms, Part 2: Secure permutations with block ciphers

Posted by Nick Johnson | Filed under coding, tech, damn-cool-algorithms

It's been too long since I blogged about anything much, and way too long since I posted the first Damn Cool Algorithms post, which I promised would be a series. So here's part 2.

To start, I'm assuming you know what a permutation is - basically a shuffling of a sequence of items in a particular order. A permutation of the range 1-10, for example, is {5,2,1,6,8,4,3,9,7,10}. A secure permutation is one in which an attacker, given any subset of the permutation, cannot determine the order of any other elements. A simple example of this would be to take a cryptographically secure pseudo-random number generator, seed it with a secret key, and use it to shuffle your sequence.

What if you want to generate a really, really big permutation - one so big precomputing and storing it isn't practical or desirable? Furthermore, what if you want it to be a secure permutation? There's a really neat trick we can pull with block ciphers that allows us to generate a secure permutation over any range of numbers without first having to precompute it.

A block cipher, for anyone that ...

Read more | Comments | 30 September, 2007

Dynamic code generation in .NET

Posted by Nick Johnson | Filed under coding, tech

I've written up a rather lengthy document on different methods of dynamically generating code in .NET - two current methods, and one hypothetical one based on a C extension called "`c" (tickc). I think it does a fairly good job of illustrating just how awkward dynamic code generation is in current mainstream languages, and how much simpler and more understandable (and in many cases, more efficient) it could be.

Blogroll