Posted by Nick Johnson
| Filed under
This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it's simple but non-obvious.
BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a 'fuzzy' search for a term. The aim is to return, for example, "seek" and "peek" if I search for "aeek". What makes BK-Trees so cool is that they take a problem which has no obvious solution besides brute-force search, and present a simple and elegant method for speeding up searches substantially.
BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper "Some approaches to best match file searching
". The only copy of this online seems to be in the ACM archive, which is subscription only. Further details, however, are provided in the excellent paper "Fast Approximate String Matching in a Dictionary
Before we can define BK-Trees, we need to define a couple of preliminaries. In order to index and search our dictionary, we need a way to compare strings. The canonical method for this is the Levenshtein ...