Damn Cool Algorithms, Part 1: BK-Trees

This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it's simple but non-obvious.

BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a 'fuzzy' search for a term. The aim is to return, for example, "seek" and "peek" if I search for "aeek". What makes BK-Trees so cool is that they take a problem which has no obvious solution besides brute-force search, and present a simple and elegant method for speeding up searches substantially.

BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching". The only copy of this online seems to be in the ACM archive, which is subscription only. Further details, however, are provided in the excellent paper "Fast Approximate String Matching in a Dictionary".

Before we can define BK-Trees, we need to define a couple of preliminaries. In order to index and search our dictionary, we need a way to compare strings. The canonical method for this is the Levenshtein ...

Serializing JavaScript objects with circular references

One problem we've been dealing with at work for a while is serializing JavaScript objects to a string representation, when objects can contain circular references. The simplest example of this is an object a, which contains one property (call it 'foo') which is a reference to a.

Mozilla provides a toSource() function on the Object class (and hence every object) which returns a serialized representation of that object, allowing it to be reconstituted with eval(). To handle circular references, any object or array can be tagged with a numeric identifier by prefixing it with "#x=" (where x is an integer), and a tagged object can be referenced with the syntax "#x#". Using this, any object graph can be serialized, and we can even make use of this system to compress the serialized text down further than Mozilla does on its own by using this syntax for all backreferences, instead of just circular ones. The example earlier, of an object referencing itself would serialize as "#1={foo:#1#}".

All fine and dandy, but unfortunately, IE doesn't implement the toSource() function. I don't know about other browsers either - toSource() is part of Netscape's original Javascript documentation, but not ...

Street Games

Recently, when we changed over to the new, smaller coins, the council went around replacing all the old (digital) meters with new, pay-and-display ones. The old meters had one meter for every 8 or so parks, and you selected the park you were paying for before paying. Inset into the footpath by each parking space, therefore, there was a cobblestone bearing a number from 1 to 8, to identify the parking space.

When they switched over to the new meters, they went along with a sandblaster and blasted off all the numbers, leaving several long series of blank cobblestones inset into the pavement at regular intervals. Walking home one night, Hayley and I got thinking: What could one do with these cobblestones?

Our first thought was to chalk or spray-paint (with a stencil) random squares from board games: "Forward 3 squares", "Roll again", "Park Terrace", "Double Word Score", "Go to Jail", "Triple letter score" and the like, to create some amusement and confusion.

But why not go one step further? Why not invent our own game people could play on the cobblestones? The question now is: what? There are a few criteria any game would have to meet:
  • Easy to ...

New Hair Color for Hayley!

Hayley showing off her new hair color:

Indexing directory structures

After much messing around with SQL DBs and table structures, I eventually resorted to writing my own indexing scheme and storing the index in memory for my FS-indexing needs.
Here's how I went about it:

1) Construct a tree out of the FS you're indexing. Each node in the tree needs a reference to its parent, but if you're just using it for this index, it doesn't actually need references to its children.
2) Construct a dictionary of lists of nodes to act as the index. Dictionary> in generics/templates speak.
3) Iterate through each item, and extract terms for that component. Terms are only extracted for the current component, not its parents - "c:\foo\bar" would only have 'bar' as a term.
4) Find the relevant item in the index for each term (or add it, if need be), and add the current node to the list of nodes against that item.

Once you've constructed the index, searching is as follows:
1) Extract the terms for your search query in the same manner as used for path components above
2) Obtain the list from the index for each of these terms. If any of ...

Obsession

For some time now, I've been working, on and off, on a P2P filesharing system, which I've dubbed 'Attercop' (named after the term used for 'spider' used in Old English and LOTR). Simply put, it's a replacement for DC++, designed mostly for LAN environments. It uses multicast, so no central hub is required, and it improves on a number of the most glaring weaknesses in DC++. I'm writing it in C#.NET.

Like most projects I embark on, I started off hugely enthusiastic, spending as much time as I could spare on it. Normally, I'd continue like that until either I get the project completed, or I burn-out on it. Quite frequently, the latter has happened - I've burned out and lost interest, with the project unfinished.

However, since I've been spending all the time I've not been at work with Hayley, I've had very little time for it lately - a lack that's hard to mourn, since I consider time with Hayley much better spent - and I'd begun to lose enthusiasm for the project. I concluded that I was (unfortunately) suffering from the same sort of mid-project disinterest I often ...

Moving In

On Sunday, Hayley's flat had a flat meeting. Hayley wanted to move out of her flat (and into our apartment full-time), and the upshot is: no problem! As soon as she/they can find a replacement flatmate, she's moving in with me full-time! :)

Found a flat!

I've found a flat, and I moved in this weekend. It's a single apartment in a '60s apartment block right in town, which is certainly convenient, and the rent is really low. It's fairly large for a single apartment, consisting of a sizeable entryway, a bathroom, bedroom, small kitchen, large lounge, and study.

The catch? The decor is hideous. As far as we can tell, some time in the 80s, the owners decided they wanted to redecorate, and couldn't decide between three different, conflicting, styles. So they used all 3, usually in different parts of the apartment, but sometimes together. The result is something you're likely to remember for a long time. There's a _big_ emphasis on border prints, and quite a lot of patterned wallpaper.

The owners are intending to re-rent or sell it in the new year, and Hayley and I are hoping we can convince them to let us redecorate. Initial attempts ("We'll repaint if you pay for the paint!") have met with "don't bother" responses, but I'm actually prepared to pay for the supplies - it shouldn't take much - in order to be able to redecorate. Strangely ...

Back to work

I'm now officially employed again. I've found a kickass job at a local software company specialising in GIS (mapping etc). Their main product is a fleet tracking package. I just finished my first day there today.

They're using Postgres extensively, with their own extensions such as some form of multi-master replication. The boxes are based on Gentoo, with (oddly), VMWare instances running windows for the application servers. The environment's quite well set up, and so far quite nice to work with (with my whole 8 hours' worth of experience with it).

I'm also starting back part-time at Uni this semester, doing two papers - COSC229 - Algorithms, and COSC362 - Microprocessor Systems 2. My new employer is quite flexible about allowing me to fit my work hours around uni, and the two are very close together to boot.

Now I just have to find a flat in the same general vicinity...

To market, to market, to buy...

Lex has posted rather interesting entry about our trip to the supermarket.