Bloog developments

Bill Katz, the original author of Bloog, has kindly added me as a contributor on the official Bloog repository. I've pushed all my changes from my 'master' branch to it, and I'm working on getting the 'breaking' changes in.

I'm thankful for...

Thanksgiving itself. I don't live in the US, wasn't born in the US, and have only visited the US occasionally, but I'm still thankful for it: The lull in blog activity has enabled me to catch up in reader for the first time in over a month.

Migrated!

I've now migrated this blog from Serendipity, the blogging package I was previously using, to Bloog, a blog platform written for App Engine. Before doing so, though, I made some fairly significant changes and improvements, including:
  • A Serendipity uploader script (so I can import my old articles).
  • (Limited) unit-testing.
  • A bunch of improvements suggested by Matteo Crippa, including:
- Comment notification by email to the blog owner.
- Gravatar support.
- SEO improvements.
- Sitemap support.
  • Theme inheritance, so new themes can be defined that make only minor modifications to existing ones, without needing to copy the entire theme.
  • Support for FirePython logging when logged in as an administrator.
  • Improvements to how comments and posts are stored and retrieved.
  • The option to use a Google Custom Search Engine instead of the built-in search.
  • Many code cleanups and minor bugfixes.
In the process, I've been learning the wonders of Git and Github, and I'm seriously impressed.

All my work is available in my own Github branch here. It has several branches; 'master' contains non-breaking changes from the original Bloog, 'breaking' consists of master plus any changes that are likely to break backwards compatibility or require running an upgrade script. 'matteocrippa.it' contains ...

Getting O2 Ireland's "Mobile Broadband" working in OSX 10.5

O2 Ireland proudly advertise that their mobile broadband offering works for both Windows and Mac. Then they proceed to offer only windows instructions, with no hint of how to get it working on a mac. This is, obviously, less than helpful. To complicate things, the generic instructions for getting Huawei modems working in OSX aren't entirely sufficient: The PIN on the sim card has to be disabled first, and the Huawei app for setting the APN doesn't seem to work in 10.5. So here are the instructions, in a nutshell:

1. Find a windows computer with administrator priveliges. Plug in the modem and follow the installation instructions. When prompted, enter your pin, then select Tools -> Pin Options -> Disable Pin. Enter your pin again. If you're trying these directions for a network other than O2 Ireland, this would be a good time to check the settings for the APN name, too.

Yes, I know it sucks to have to use a PC to set it up. It might be possible to do this by putting the 3G sim into a cellphone and disabling the PIN using that - I haven't tried, but it seems like it should ...

Nearly all DHT implementations vulnerable to 'merge' bug.

As DHT implementations proliferate and harmonise, the prospect of multiple widely-deployed applications using the same or compatiable DHT implementations is increasingly becoming a reality. There are a large and increasing number of DHT libraries out there, such as Entangled , and FreePastry, being used by an increasing number of applications.

However, most of these implementations are vulnerable to a simple but subtle bug: They have no way of distinguishing one DHT network from another. Although each application's DHT network or networks start off separate, if, by chance or deliberate action a node from a different but compatiable DHT is introduced, the 'self healing' property of DHTs will ensure that, sooner or later, the two networks become merged into a single DHT.

This is not a problem for single-purpose DHT implementations such as those used by BitTorrent or Overnet, since they generally establish a single DHT with all compatiable clients participating in any case. Nor is this a problem for networks designed with heterogenous applications in mind, such as CSpace. However, this still leaves a number of DHT libraries that don't fall into either of these categories.

If DHTs from two distinct applications using compatiable implementations become merged, the outcome ...

SMTP to HTTP gateway for your App Engine (and other) apps!

In response to a comment in the freenode.net/#appengine channel by someone wishing their App Engine app could receive email, I put together smtp2web, a simple service that accepts mail for an address (or your entire domain), and sends it via HTTP POST to a URL you specify. If you're running in a restricted environment such as App Engine, this means you can now receive email. Even if you're not, this is a lot simpler to use than writing your own SMTP server (or adding custom handlers to most existing servers).

Someone's already blogged about it, too.

Update on Anagram Trees

Original Post

One nice thing about working at Google is that you are surrounded by very smart people. I told one of my coworkers about the anagram tree idea, and he immediately pointed out that reordering the alphabet so that the least frequently used letters come first would reduce the branching factor early in the tree, which has the effect of reducing the overall size of the tree substantially. While this seems obvious in retrospect, it's kind of unintuitive - usually we try to _increase_ the branching factor of n-ary trees to make them shallower and require fewer operations, rather than trying to reduce it.

Trying it out with an ordering determined by looking at the branching factor for each letter produces results that bear this out: Memory is reduced by about a third, and the number of internal nodes is reduced to 858,858 from 1,874,748, a reduction of more than 50! Though I haven't benchmarked it, difficult lookups are substantially faster, too.

The next logical development to try is to re-evaluate the order of the alphabet on a branch-by-branch basis. While I doubt this will have a substantial impact, it seems worth a try, so ...

Damn Cool Algorithms, Part 3: Anagram Trees

I hesitate to call this algorithm "damn cool", since it's something I invented* it myself, but I think it _is_ rather cool, and it fits the theme of my algorithms posts, so here it is anyway.

When it comes to finding anagrams of words, a frequent approach is to use an anagram dictionary - simply put, sort the letters in your word to provide a unique key that all anagrams of a word have in common. Another approach is to generate a letter-frequency histogram for each letter in your word. (Both these approaches are more or less equivalent, in fact.) These approaches make the problem of finding exact single-word anagrams for strings very efficient - O(1) if you use a hashtable.

However, the problem of finding subset anagrams - a word that contains a subset of the letters in a string - is still rather inefficient, requiring either a brute force O(n) search through the dictionary, or looking up every substring of the sorted input string, which is O(2^l) with the number of letters in the input string. Finding subset anagrams is significantly more interesting, too, as it has applications in finding multi-word anagrams, as well as being applicable ...

Damn Cool Algorithms, Part 2: Secure permutations with block ciphers

It's been too long since I blogged about anything much, and way too long since I posted the first Damn Cool Algorithms post, which I promised would be a series. So here's part 2.

To start, I'm assuming you know what a permutation is - basically a shuffling of a sequence of items in a particular order. A permutation of the range 1-10, for example, is {5,2,1,6,8,4,3,9,7,10}. A secure permutation is one in which an attacker, given any subset of the permutation, cannot determine the order of any other elements. A simple example of this would be to take a cryptographically secure pseudo-random number generator, seed it with a secret key, and use it to shuffle your sequence.

What if you want to generate a really, really big permutation - one so big precomputing and storing it isn't practical or desirable? Furthermore, what if you want it to be a secure permutation? There's a really neat trick we can pull with block ciphers that allows us to generate a secure permutation over any range of numbers without first having to precompute it.

A block cipher, for anyone that ...

I am teh famous, LOL!

LOLCode.NET was featured at TechED07, in a presentation by Nick Hodge, a "Professional Geek" for Microsoft. He's posted a video of the presentation on his blog.

Now I just need to convince them to integrate it into the next edition of VS...