<?xml version="1.0" encoding="utf-8" ?>

<rss version="2.0" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:admin="http://webns.net/mvcb/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
   xmlns:wfw="http://wellformedweb.org/CommentAPI/"
   xmlns:content="http://purl.org/rss/1.0/modules/content/"
   >
<channel>
    <title>Nick's blog - tech</title>
    <link>http://blog.notdot.net/</link>
    <description>Because repeating myself sucks.</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.2 - http://www.s9y.org/</generator>
    <pubDate>Wed, 19 Nov 2008 21:55:08 GMT</pubDate>

    <image>
        <url>http://blog.notdot.net/templates/default/img/s9y_banner_small.png</url>
        <title>RSS: Nick's blog - tech - Because repeating myself sucks.</title>
        <link>http://blog.notdot.net/</link>
        <width>100</width>
        <height>21</height>
    </image>

<item>
    <title>Getting O2 Ireland's &quot;Mobile Broadband&quot; working in OSX 10.5</title>
    <link>http://blog.notdot.net/archives/49-Getting-O2-Irelands-Mobile-Broadband-working-in-OSX-10.5.html</link>
            <category>tech</category>
    
    <comments>http://blog.notdot.net/archives/49-Getting-O2-Irelands-Mobile-Broadband-working-in-OSX-10.5.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=49</wfw:comment>

    <slash:comments>2</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=49</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    O2 Ireland proudly advertise  that their mobile broadband offering works for both Windows and Mac. Then they proceed to offer only windows instructions, with no hint of how to get it working on a mac. This is, obviously, less than helpful. To complicate things, the generic instructions for getting Huawei modems working in OSX aren&#039;t entirely sufficient: The PIN on the sim card has to be disabled first, and the Huawei app for setting the APN doesn&#039;t seem to work in 10.5. So here are the instructions, in a nutshell:&lt;br /&gt;
&lt;br /&gt;
1. Find a windows computer with administrator priveliges. Plug in the modem and follow the installation instructions. When prompted, enter your pin, then select Tools -&gt; Pin Options -&gt; Disable Pin. Enter your pin again. If you&#039;re trying these directions for a network other than O2 Ireland, this would be a good time to check the settings for the APN name, too.&lt;br /&gt;
&lt;br /&gt;
Yes, I know it sucks to have to use a PC to set it up. It might be possible to do this by putting the 3G sim into a cellphone and disabling the PIN using that - I haven&#039;t tried, but it seems like it should work.&lt;br /&gt;
&lt;br /&gt;
2. Follow the rest of the instructions &lt;a href=&quot;http://blog.evandavey.com/2008/02/how-to-connect-huawei-e220-usb-modem.html&quot;&gt;here&lt;/a&gt;, with the following addenda:&lt;br /&gt;
- You may not need to add the device as described. In my case, it automatically added itself to the network preferences panel, and just needed configuring.&lt;br /&gt;
- The E220 drivers work fine if you have an E270, too.&lt;br /&gt;
- If you&#039;re using O2 Ireland, the APN you want to enter is &quot;open.internet&quot;. 
    </content:encoded>

    <pubDate>Tue, 29 Jul 2008 16:11:42 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/49-guid.html</guid>
    
</item>
<item>
    <title>Nearly all DHT implementations vulnerable to 'merge' bug.</title>
    <link>http://blog.notdot.net/archives/47-Nearly-all-DHT-implementations-vulnerable-to-merge-bug..html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/47-Nearly-all-DHT-implementations-vulnerable-to-merge-bug..html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=47</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=47</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    As DHT implementations proliferate and harmonise, the prospect of multiple widely-deployed applications using the same or compatiable DHT implementations is increasingly becoming a reality. There are a large and increasing number of DHT libraries out there, such as &lt;a href=&quot;http://entangled.sourceforge.net/&quot;&gt;Entangled&lt;/a&gt; , and &lt;a href=&quot;http://freepastry.org/&quot;&gt;FreePastry&lt;/a&gt;, being used by an increasing number of applications.&lt;br /&gt;
&lt;br /&gt;
However, most of these implementations are vulnerable to a simple but subtle bug: They have no way of distinguishing one DHT network from another. Although each application&#039;s DHT network or networks start off separate, if, by chance or deliberate action a node from a different but compatiable DHT is introduced, the &#039;self healing&#039; property of DHTs will ensure that, sooner or later, the two networks become merged into a single DHT.&lt;br /&gt;
&lt;br /&gt;
This is not a problem for single-purpose DHT implementations such as those used by &lt;a href=&quot;http://bittorrent.org/&quot;&gt;BitTorrent&lt;/a&gt; or &lt;a href=&quot;http://en.wikipedia.org/wiki/Overnet&quot;&gt;Overnet&lt;/a&gt;, since they generally establish a single DHT with all compatiable clients participating in any case. Nor is this a problem for networks designed with heterogenous applications in mind, such as &lt;a href=&quot;http://cspace.in/&quot;&gt;CSpace&lt;/a&gt;. However, this still leaves a number of DHT libraries that don&#039;t fall into either of these categories.&lt;br /&gt;
&lt;br /&gt;
If DHTs from two distinct applications using compatiable implementations become merged, the outcome depends to a large extent on how those applications make use of the DHT. If they use compatiable parameters and only make use of the basic DHT store/retrieve/find-node functionality, the impact may be minimal, though differences in persistence time and maximum size for stored elements can still lead to issues. If the DHTs have different parameters, such as the number of nodes to replicate inserted data to, the result may be subtly broken - a DHT that no longer fulfils the guarantees the software expects from it.&lt;br /&gt;
&lt;br /&gt;
If the two applications expect very different things from their DHTs, though, such as a varying set of RPCs supported, the result could be a complete breakdown in the usefulness of the DHT. If your application expects that the vast majority of participating nodes will support its custom doFoo RPC (not an unreasonable expectation when you control the application), and suddenly half the nodes don&#039;t, the probability is that the application&#039;s behaviour will be severely degraded, at best.** This amounts to an entirely accidental Denial of Service attack on both applications involved.&lt;br /&gt;
&lt;br /&gt;
The solution is relatively simple, of course - simply add an application identifier or &#039;magic number&#039; to the DHT protocol, and require each application to specify an identifier unique to it. Then, the DHT library can simply discard packets with IDs it doesn&#039;t recognise.&lt;br /&gt;
&lt;br /&gt;
In a quick survey of current DHT implementations, at least the following seem to be vulnerable:&lt;br /&gt;
&lt;ul&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://entangled.sourceforge.net/&quot;&gt;Entangled&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://khashmir.sourceforge.net/&quot;&gt;KHashmir&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://www.thomas.ambus.dk/plan-x/routing/&quot;&gt;Plan-X&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://www.heim-d.uni-sb.de/~heikowu/SharkyPy/&quot;&gt;SharkyPy&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://open-chord.sourceforge.net/&quot;&gt;Open Chord&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;&lt;a href=&quot;http://current.cs.ucsb.edu/projects/chimera/&quot;&gt;Chimera&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;
&lt;/ul&gt;&lt;br /&gt;
&lt;a href=&quot;http://freepastry.org/&quot;&gt;Freepastry&lt;/a&gt; is notable in that it does provide for a &#039;magic number&#039; facility. However, this is buried deep in the protocol stack, and is not required in order to create a Freepastry DHT instance.&lt;br /&gt;
&lt;br /&gt;
I&#039;m not aware of this bug occurring yet, but with increasing use of DHTs, it seems to me to be all but inevitable that it will occur at some point. Further, restoring normal behaviour after it has occurred is likely to be extremely difficult, likely requiring starting an entirely new DHT instance and moving all existing clients over to it.&lt;br /&gt;
&lt;br /&gt;
** Please forgive my overuse of generalities. As I&#039;m speaking of a general bug shared across many libraries and implementations, it&#039;s hard to be specific about the impact. 
    </content:encoded>

    <pubDate>Sun, 15 Jun 2008 13:34:09 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/47-guid.html</guid>
    
</item>
<item>
    <title>SMTP to HTTP gateway for your App Engine (and other) apps!</title>
    <link>http://blog.notdot.net/archives/46-SMTP-to-HTTP-gateway-for-your-App-Engine-and-other-apps!.html</link>
            <category>app-engine</category>
    
    <comments>http://blog.notdot.net/archives/46-SMTP-to-HTTP-gateway-for-your-App-Engine-and-other-apps!.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=46</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=46</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    In response to a comment in the freenode.net/#appengine channel by someone wishing their App Engine app could receive email, I put together &lt;a href=&quot;http://www.smtp2web.com/&quot;&gt;smtp2web&lt;/a&gt;, a simple service that accepts mail for an address (or your entire domain), and sends it via HTTP POST to a URL you specify. If you&#039;re running in a restricted environment such as App Engine, this means you can now receive email. Even if you&#039;re not, this is a lot simpler to use than writing your own SMTP server (or adding custom handlers to most existing servers).&lt;br /&gt;
&lt;br /&gt;
Someone&#039;s &lt;a href=&quot;http://almaer.com/blog/smtp2webcom-bridge-smtp-to-http-let-app-engine-accept-email&quot;&gt;already blogged about it&lt;/a&gt;, too. 
    </content:encoded>

    <pubDate>Sat, 14 Jun 2008 22:23:13 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/46-guid.html</guid>
    
</item>
<item>
    <title>Update on Anagram Trees</title>
    <link>http://blog.notdot.net/archives/39-Update-on-Anagram-Trees.html</link>
            <category>damn-cool-algorithms</category>
    
    <comments>http://blog.notdot.net/archives/39-Update-on-Anagram-Trees.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=39</wfw:comment>

    <slash:comments>7</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=39</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    &lt;a href=&quot;http://blog.notdot.net/archives/38-Damn-Cool-Algorithms,-Part-3-Anagram-Trees.html &quot;&gt;Original Post&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
One nice thing about working at Google is that you are surrounded by very smart people. I told one of my coworkers about the anagram tree idea, and he immediately pointed out that reordering the alphabet so that the least frequently used letters come first would reduce the branching factor early in the tree, which has the effect of reducing the overall size of the tree substantially. While this seems obvious in retrospect, it&#039;s kind of unintuitive - usually we try to &lt;u&gt;increase&lt;/u&gt; the branching factor of n-ary trees to make them shallower and require fewer operations, rather than trying to reduce it.&lt;br /&gt;
&lt;br /&gt;
Trying it out with an ordering determined by looking at the branching factor for each letter produces results that bear this out: Memory is reduced by about a third, and the number of internal nodes is reduced to 858,858 from 1,874,748, a reduction of more than 50! Though I haven&#039;t benchmarked it, difficult lookups are substantially faster, too.&lt;br /&gt;
&lt;br /&gt;
The next logical development to try is to re-evaluate the order of the alphabet on a branch-by-branch basis. While I doubt this will have a substantial impact, it seems worth a try, so I&#039;ll give it a go and update with results.&lt;br /&gt;
&lt;br /&gt;
Edit: Re-evaluating the symbol to choose on a branch-by-branch basis had a bigger impact than I anticipated: The tree created with my sample dictionary now has a mere 661,659 internal nodes. Here&#039;s the procedure for creating a tree using this method:&lt;br /&gt;
&lt;br /&gt;
Assuming you have:&lt;ul&gt;&lt;li&gt;A dictionary&lt;/li&gt;&lt;li&gt;A set of symbols that have not yet been used (initially set to the alphabet)&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;If the symbol set is empty, this is a leaf node - store the dictionary in the node and return.&lt;/li&gt;&lt;li&gt;Find the symbol from the set that, if used, will result in the smallest number of branches (that is, the symbol that has the least variation in number of occurrences).&lt;/li&gt;&lt;li&gt;Mark the current node with the chosen symbol&lt;/li&gt;&lt;li&gt;Partition the dictionary into sub-dictionaries based on how many occurrences of the chosen symbol they have&lt;/li&gt;&lt;li&gt;For each sub-dictionary, recurse with the sub-dictionary and the set less the symbol you selected.&lt;/li&gt;&lt;/ol&gt;Implemented in Python, this is actually substantially larger in memory and on disk than the previous approach, likely due to overhead with using classes instead of tuples as the nodes. In statically-typed languages, however, the overhead should be substantially outweighed by the benefit of the reduction in node count.&lt;br /&gt;
&lt;br /&gt;
Note that the result of this alternate method is that while the letter to branch on is different for every node, following nodes from any leaf to the root of the tree always results in a valid permutation of the alphabet used.&lt;br /&gt;
&lt;br /&gt;
Edit 2: The code for a Python implementation incorporating these ideas can be found &lt;a href=&quot;http://blog.notdot.net/uploads/subsettree.pys&quot;&gt;here&lt;/a&gt;. 
    </content:encoded>

    <pubDate>Sat, 20 Oct 2007 04:01:31 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/39-guid.html</guid>
    
</item>
<item>
    <title>Damn Cool Algorithms, Part 3: Anagram Trees</title>
    <link>http://blog.notdot.net/archives/38-Damn-Cool-Algorithms,-Part-3-Anagram-Trees.html</link>
            <category>damn-cool-algorithms</category>
    
    <comments>http://blog.notdot.net/archives/38-Damn-Cool-Algorithms,-Part-3-Anagram-Trees.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=38</wfw:comment>

    <slash:comments>20</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=38</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    I hesitate to call this algorithm &quot;damn cool&quot;, since it&#039;s something I invented* it myself, but I think it &lt;u&gt;is&lt;/u&gt; rather cool, and it fits the theme of my algorithms posts, so here it is anyway.&lt;br /&gt;
&lt;br /&gt;
When it comes to finding anagrams of words, a frequent approach is to use an &lt;a href=&quot;http://en.wikipedia.org/wiki/Anagram_dictionary&quot;&gt;anagram dictionary&lt;/a&gt; - simply put, sort the letters in your word to provide a unique key that all anagrams of a word have in common. Another approach is to generate a letter-frequency histogram for each letter in your word. (Both these approaches are more or less equivalent, in fact.) These approaches make the problem of finding exact single-word anagrams for strings very efficient - O(1) if you use a hashtable. &lt;br /&gt;
&lt;br /&gt;
However, the problem of finding subset anagrams - a word that contains a subset of the letters in a string - is still rather inefficient, requiring either a brute force O(n) search through the dictionary, or looking up every substring of the sorted input string, which is O(2^l) with the number of letters in the input string. Finding subset anagrams is significantly more interesting, too, as it has applications in finding multi-word anagrams, as well as being applicable to problem domains such as scrabble.&lt;br /&gt;
&lt;br /&gt;
However, with a little more effort, and the above observation that we can generate a histogram that uniquely represents a given set of letters, we can generate a tree structure that makes looking up subset anagrams much more efficient. To build the tree, we follow this simple procedure:&lt;br /&gt;
&lt;br /&gt;
Assume we have the following information:&lt;ul&gt;&lt;li&gt;A lexicon or dictionary of words to populate the tree with&lt;/li&gt;&lt;li&gt;An alphabet for words in the lexicon&lt;/li&gt;&lt;li&gt;The tree we are building&lt;/li&gt;&lt;li&gt;A current node&lt;/li&gt;&lt;/ul&gt;For each term in the lexicon:&lt;ol&gt;&lt;li&gt;Generate a letter-frequency histogram for the term.&lt;/li&gt;&lt;li&gt;Set the current node to the root of the tree.&lt;/li&gt;&lt;li&gt;For each symbol in the alphabet:&lt;ol&gt;&lt;li&gt;Get the frequency of the current symbol in the current term. Call it &lt;i&gt;f&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Set the current node to the f&lt;sup&gt;th&lt;/sup&gt; child node of the current node, creating it if it doesn&#039;t exist&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;Append the current term to the list of words on the current (leaf) node&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
The result of following this simple procedure is a fixed-height tree, 27 nodes deep, with all the words in the leaf nodes, and each internal tier of the tree corresponding to a symbol from the alphabet. Here&#039;s an (abbreviated) example:&lt;br /&gt;
&lt;img width=&#039;464&#039; height=&#039;477&#039; style=&quot;border: 0px; padding-left: 5px; padding-right: 5px;&quot; src=&quot;http://blog.notdot.net/uploads/anagramtree.png&quot; alt=&quot;&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
Once the tree is built, we find subset anagrams for an input string as follows:&lt;br /&gt;
&lt;br /&gt;
Assume we have the following information:&lt;ul&gt;&lt;li&gt;The tree we built using the above procedure.&lt;/li&gt;&lt;li&gt;The alphabet we used above.&lt;/li&gt;&lt;li&gt;A frontier set, initially empty.&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;Initialize the frontier set to contain the root of the tree.&lt;/li&gt;&lt;li&gt;Generate a letter-frequency histogram for the input string.&lt;/li&gt;&lt;li&gt;For each symbol in the alphabet:&lt;ol&gt;&lt;li&gt;Get the frequency of the current symbol in the input string. Call it &lt;i&gt;f&lt;/i&gt;.&lt;/li&gt;&lt;li&gt;For each node in the current frontier set, add the subnodes numbered 0 through f to the  new frontier set.&lt;/li&gt;&lt;/ol&gt;&lt;/li&gt;&lt;li&gt;The frontier set now consists of leaf nodes, containing all the subset anagrams of the input string.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;
Runtime analysis of this algorithm is rather difficult, for me, at least. Intuitively and in practice, it&#039;s a lot faster than either of the brute-force approaches, but quantifying that in big-O notation is something that&#039;s escaped me. As an upper bound, it cannot be less efficient than O(n) - only a constant factor worse than the brute-force approach. As a lower bound, a lookup in which the frontier set always has one node, lookup time is proportional to the length of the alphabet, or O(1). The average case depends on how large a subset of the dictionary the input string selects. Quantifying by the size of the output, approximately O(m) operations are required. If anyone knows how to determine more solid bounds for runtime, please do let me know in the comments.&lt;br /&gt;
&lt;br /&gt;
One disadvantage of this approach is that there is substantial memory overhead. Using my Python implementation of the algorithm, and importing /usr/share/dict/words, which is approximately 2MB on this machine results in over 300MB of memory consumed. Using the Pickle module to serialize to disk, the output file is over 30MB, and compresses with gzip down to about 7MB. I suspect part of the large memory overhead is due to the minimum size of Python&#039;s dictionaries; I will modify the implementation to use lists and update this post if I can make it more efficient.&lt;br /&gt;
&lt;br /&gt;
Here&#039;s a few stats on the tree generated that may be of interest:&lt;br /&gt;
Total words: 234,936&lt;br /&gt;
Leaf nodes: 215,366&lt;br /&gt;
Internal nodes: 1,874,748&lt;br /&gt;
&lt;br /&gt;
From this we can see that the average cardinality of internal nodes is very low - not much more than 1. A breakdown of the number of nodes in each tier helps clarify this:&lt;br /&gt;
&lt;table&gt;&lt;tr&gt;&lt;th&gt;Tier&lt;/th&gt;&lt;th&gt;Number of nodes&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt; 1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt; 7&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;2&lt;/td&gt;&lt;td&gt; 25&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt; 85&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt; 203&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt; 707&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt; 1145&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt; 1886&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt; 3479&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;9&lt;/td&gt;&lt;td&gt; 8156&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt; 8853&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;11&lt;/td&gt;&lt;td&gt; 10835&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt; 19632&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;13&lt;/td&gt;&lt;td&gt; 28470&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;14&lt;/td&gt;&lt;td&gt; 47635&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;15&lt;/td&gt;&lt;td&gt; 73424&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;16&lt;/td&gt;&lt;td&gt; 92618&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;17&lt;/td&gt;&lt;td&gt; 94770&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;18&lt;/td&gt;&lt;td&gt; 125018&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;19&lt;/td&gt;&lt;td&gt; 156406&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;20&lt;/td&gt;&lt;td&gt; 182305&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;21&lt;/td&gt;&lt;td&gt; 195484&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;22&lt;/td&gt;&lt;td&gt; 200031&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;23&lt;/td&gt;&lt;td&gt; 203923&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;24&lt;/td&gt;&lt;td&gt; 205649&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;25&lt;/td&gt;&lt;td&gt; 214001&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;br /&gt;
The cardinality of nodes towards the top of the tree is fairly high, but the tree quickly flattens out, and the last four tiers of the tree account for almost half of the total nodes. This suggests one possible space optimisation: Removing the last few tiers of the tree and concatenating their leaf nodes together. When performing lookups, check the selected nodes to ensure they are actually subset anagrams of the input string.&lt;br /&gt;
&lt;br /&gt;
* It&#039;s possible I&#039;m simply rediscovering something that&#039;s well known in the computer science community, or perhaps mentioned in a computer science paper 30 years ago. Significant searching hasn&#039;t turned up anyone using an algorithm like this, or anything else more efficient than the brute-force approaches outlined.&lt;br /&gt;
&lt;br /&gt;
Edit: The source to my initial implementation is &lt;a href=&quot;http://blog.notdot.net/uploads/anagramfinder.pys&quot;&gt;here&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Edit: Converting my Python implementation to use lists reduced memory consumption by roughly half. I&#039;ll post figures for the pickled tree and the source code when I have the opportunity.&lt;br /&gt;
&lt;br /&gt;
Edit: More updates can be found &lt;a href=&quot;http://blog.notdot.net/archives/39-Update-on-Anagram-Trees.html&quot;&gt;here&lt;/a&gt;. 
    </content:encoded>

    <pubDate>Fri, 19 Oct 2007 04:00:50 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/38-guid.html</guid>
    
</item>
<item>
    <title>Damn Cool Algorithms, Part 2: Secure permutations with block ciphers</title>
    <link>http://blog.notdot.net/archives/37-Damn-Cool-Algorithms,-Part-2-Secure-permutations-with-block-ciphers.html</link>
            <category>damn-cool-algorithms</category>
    
    <comments>http://blog.notdot.net/archives/37-Damn-Cool-Algorithms,-Part-2-Secure-permutations-with-block-ciphers.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=37</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=37</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    It&#039;s been too long since I blogged about anything much, and way too long since I posted the first &lt;a href=&quot;http://blog.notdot.net/archives/30-Damn-Cool-Algorithms,-Part-1-BK-Trees.html&quot;&gt;Damn Cool Algorithms&lt;/a&gt; post, which I promised would be a series. So here&#039;s part 2.&lt;br /&gt;
&lt;br /&gt;
To start, I&#039;m assuming you know what a &lt;a href=&quot;http://en.wikipedia.org/wiki/Permutation&quot;&gt;permutation&lt;/a&gt; is - basically a shuffling of a sequence of items in a particular order. A permutation of the range 1-10, for example, is {5,2,1,6,8,4,3,9,7,10}. A secure permutation is one in which an attacker, given any subset of the permutation, cannot determine the order of any other elements. A simple example of this would be to take a cryptographically secure pseudo-random number generator, seed it with a secret key, and use it to shuffle your sequence.&lt;br /&gt;
&lt;br /&gt;
What if you want to generate a really, really big permutation - one so big precomputing and storing it isn&#039;t practical or desirable? Furthermore, what if you want it to be a secure permutation? There&#039;s a really neat trick we can pull with block ciphers that allows us to generate a secure permutation over any range of numbers without first having to precompute it.&lt;br /&gt;
&lt;br /&gt;
A &lt;a href=&quot;http://en.wikipedia.org/wiki/Block_cipher&quot;&gt;block cipher&lt;/a&gt;, for anyone that isn&#039;t familiar with them, is a common cryptographic primitive. It takes blocks of ciphertext of some fixed lengths - 64 or 128 bits is common - and encrypts it. Given the same key and the same block of plaintext, it will always generate the same block of ciphertext. Messages larger than a single block are encrypted using one of a number of &lt;a href=&quot;http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation&quot;&gt;modes of operation&lt;/a&gt;, allowing messages much larger than a single block to be encrypted and decrypted securely. When using a block cipher for encryption, choice of the mode of operation is critical. For the purposes of generating a secure permutation, however, we&#039;re only going to be encrypting a single block at a time, so we don&#039;t have to worry about modes of operation.&lt;br /&gt;
&lt;br /&gt;
If you look at how a block cipher operates - taking any block of a given length (think of blocks as very large numbers here) and converting it uniquely to another block, such that it can be converted back again, a block cipher is already a secure permutation. If we progressively encrypt larger numbers (1, 2, 3, and so on), we get out a random seeming sequence of output numbers that is guaranteed not to repeat as long as we don&#039;t repeat our input. It&#039;s easy to prove this to yourself: If it repeated, then you would have two input numbers with a single output number, and it would thus be impossible to decrypt uniquely. So the same properties that a block cipher requires are the properties that make it useful to us.&lt;br /&gt;
&lt;br /&gt;
All very well, you say, but what if I want a permutation over a range that isn&#039;t a power of two? This is where the clever trick comes in. Take a block cipher that&#039;s got a block length slightly larger than you want. Use it as described above, encrypting progressively higher numbers in a sequence to generate elements in the permutation. Whenever the output of the encryption is outside the range you want for your permutation, just encrypt it again. Repeat until you get a number within the range you want. Again, we&#039;re guaranteed uniqueness by the block cipher, and we&#039;re also guaranteed (by exhaustion) that we will eventually get a number within the desired range.&lt;br /&gt;
&lt;br /&gt;
Obviously, there are some factors that need to be taken into consideration before pursuing this route. You want to select a block cipher that is not much larger than the range you wish to generate a permutation over - preferably the next power of two. The ratio of the cipher&#039;s range to the permutation&#039;s range defines the average amount of work you will have to perform, so if the cipher has a range four times that of your permutation, you&#039;ll need to do an average of four encryptions for each value. Since most block ciphers are 64, 128, or more bits, this can be problematic. For this purpose, I&#039;ve found the &lt;a href=&quot;http://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm&quot;&gt;TEA&lt;/a&gt; cipher to be particularly adaptable. It is easy to create variants that are 32, 64, 128 or more bits long, and from there, the bitshifts in the main loop are easily adjusted to produce a cipher with a length that&#039;s any power of 4, without needing to shorten the key to the point where it&#039;s easily brute-forced.&lt;br /&gt;
&lt;br /&gt;
It&#039;s also worth noting that although this technique is aimed at generating very large secure permutations, it is equally useful for a permutation that doesn&#039;t need to be secure or secret - your secret key simply becomes your random seed for the permutation. There are many situations in which this can be useful - what you essentially have is a mapping function from index to permutation value, so you can calculate the value of any subset of the permutation that you wish.&lt;br /&gt;
&lt;br /&gt;
Finally, bear in mind that due to the factorial explosion of the number of possible permutations, the keyspace of your cipher is almost certainly going to be much smaller than the number of possible permutations. For most purposes this probably does not matter, since the number of possible permutations is too large to enumerate anyway, but if your key is sufficiently short, it allows the possibility of an attacker doing an exhaustive search of your keyspace to find the permutation that generates the subsequence of the permutation he has access to.&lt;br /&gt;
&lt;br /&gt;
Update: Yossi Oren points to &lt;a href=&quot;http://www.cs.ucdavis.edu/~rogaway/papers/subset.pdf&quot;&gt;this excellent paper&lt;/a&gt; in the comments. It covers exactly what I describe here (only much more comprehensively, of course). 
    </content:encoded>

    <pubDate>Sun, 30 Sep 2007 01:34:26 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/37-guid.html</guid>
    
</item>
<item>
    <title>I am teh famous, LOL!</title>
    <link>http://blog.notdot.net/archives/33-I-am-teh-famous,-LOL!.html</link>
            <category>lolcode</category>
    
    <comments>http://blog.notdot.net/archives/33-I-am-teh-famous,-LOL!.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=33</wfw:comment>

    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=33</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    LOLCode.NET was featured at TechED07, in a presentation by Nick Hodge, a &quot;Professional Geek&quot; for Microsoft. He&#039;s posted a video of the presentation on &lt;a href=&quot;http://www.nickhodge.com/blog/archives/2048&quot;&gt;his blog&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
Now I just need to convince them to integrate it into the next edition of VS... 
    </content:encoded>

    <pubDate>Tue, 28 Aug 2007 07:58:33 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/33-guid.html</guid>
    
</item>
<item>
    <title>LOLCode.net - Now your LOLCats can use the CLR!</title>
    <link>http://blog.notdot.net/archives/32-LOLCode.net-Now-your-LOLCats-can-use-the-CLR!.html</link>
            <category>lolcode</category>
    
    <comments>http://blog.notdot.net/archives/32-LOLCode.net-Now-your-LOLCats-can-use-the-CLR!.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=32</wfw:comment>

    <slash:comments>36</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=32</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    The past few days,  I&#039;ve been working on a &lt;a href=&quot;http://code.google.com/p/lolcode-dot-net/&quot; &gt;.NET compiler&lt;/a&gt; for the &lt;a href=&quot;http://www.lolcode.com/&quot; &gt;LOLCode&lt;/a&gt; language.&lt;br /&gt;
&lt;br /&gt;
LOLCode is an emerging esoteric (and hilarious) language based on the dialect used in &lt;a href=&quot;http://icanhascheezburger.com/&quot; &gt;LOLCats&lt;/a&gt; images. It&#039;s been siezed upon by a group of people (myself included, now), and is being expanded into a real, workable, turing complete esoteric language (though nobody has proven its turing completeness yet!).&lt;br /&gt;
&lt;br /&gt;
The LOLCode.NET compiler is now working, and as a nearly-free bonus for using the .NET platform, you can even debug it in Visual Studio: &lt;img src=&quot;http://lolcode-dot-net.googlecode.com/svn/trunk/VSDebug1.png&quot; alt=&quot;&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;img src=&quot;http://lolcode-dot-net.googlecode.com/svn/trunk/VSDebug2.png&quot; alt=&quot;&quot; /&gt;&lt;br /&gt;
&lt;br /&gt;
I still can&#039;t believe I&#039;m looking at LOLCode and its x86 disassembly in Visual Studio. 
    </content:encoded>

    <pubDate>Sun, 03 Jun 2007 04:04:24 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/32-guid.html</guid>
    
</item>
<item>
    <title>Dynamic code generation in .NET</title>
    <link>http://blog.notdot.net/archives/31-Dynamic-code-generation-in-.NET.html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/31-Dynamic-code-generation-in-.NET.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=31</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=31</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    I&#039;ve written up a &lt;a href=&quot;http://docs.google.com/Doc?id=dwct8ps_8c5rktz&quot; &gt;rather lengthy document&lt;/a&gt; on different methods of dynamically generating code in .NET - two current methods, and one hypothetical one based on a C extension called &quot;`c&quot; (tickc). I think it does a fairly good job of illustrating just how awkward dynamic code generation is in current mainstream languages, and how much simpler and more understandable (and in many cases, more efficient) it could be. 
    </content:encoded>

    <pubDate>Tue, 29 May 2007 02:21:21 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/31-guid.html</guid>
    
</item>
<item>
    <title>Damn Cool Algorithms, Part 1: BK-Trees</title>
    <link>http://blog.notdot.net/archives/30-Damn-Cool-Algorithms,-Part-1-BK-Trees.html</link>
            <category>damn-cool-algorithms</category>
    
    <comments>http://blog.notdot.net/archives/30-Damn-Cool-Algorithms,-Part-1-BK-Trees.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=30</wfw:comment>

    <slash:comments>33</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=30</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    This is the first post in (hopefully) a series of posts on Damn Cool Algorithms - essentially, any algorithm I think is really Damn Cool, particularly if it&#039;s simple but non-obvious.&lt;br /&gt;
&lt;br /&gt;
BK-Trees, or Burkhard-Keller Trees are a tree-based data structure engineered for quickly finding near-matches to a string, for example, as used by a spelling checker, or when doing a &#039;fuzzy&#039; search for a term. The aim is to return, for example, &quot;seek&quot; and &quot;peek&quot; if I search for &quot;aeek&quot;. What makes BK-Trees so cool is that they take a problem which has no obvious solution besides brute-force search, and present a simple and elegant method for speeding up searches substantially.&lt;br /&gt;
&lt;br /&gt;
BK-Trees were first proposed by Burkhard and Keller in 1973, in their paper &quot;&lt;a href=&quot;http://portal.acm.org/citation.cfm?id=362003.362025&quot; &gt;Some approaches to best match file searching&lt;/a&gt;&quot;. The only copy of this online seems to be in the ACM archive, which is subscription only. Further details, however, are provided in the excellent paper &quot;&lt;a href=&quot;http://citeseer.ist.psu.edu/1593.html&quot; &gt;Fast Approximate String Matching in a Dictionary&lt;/a&gt;&quot;.&lt;br /&gt;
&lt;br /&gt;
Before we can define BK-Trees, we need to define a couple of preliminaries. In order to index and search our dictionary, we need a way to compare strings. The canonical method for this is the &lt;a href=&quot;http://en.wikipedia.org/wiki/Levenshtein_Distance&quot; &gt;Levenshtein Distance&lt;/a&gt;, which takes two strings, and returns a number representing the minimum number of insertions, deletions and replacements required to translate one string into the other. Other string functions are also acceptable (for example, one incorportating the concept of transpositions as an atomic operation could be used), as long as they meet the criteria defined below.&lt;br /&gt;
&lt;br /&gt;
Now we can make a particularly useful observation about the Levenshtein Distance: It forms a &lt;a href=&quot;http://en.wikipedia.org/wiki/Metric_space&quot;&gt;Metric Space&lt;/a&gt;. Put simply, a metric space is any relationship that adheres to three basic criteria:&lt;br /&gt;
&lt;ul&gt;&lt;br /&gt;
&lt;li&gt;d(x,y) = 0 &lt;-&gt; x = y &lt;i&gt;(If the distance between x and y is 0, then x = y)&lt;/i&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;d(x,y) = d(y,x) &lt;i&gt;(The distance from x to y is the same as the distance from y to x)&lt;/i&gt;&lt;/li&gt;&lt;br /&gt;
&lt;li&gt;d(x,y) + d(y,z) &gt;= d(x,z)&lt;/li&gt;&lt;br /&gt;
&lt;/ul&gt;&lt;br /&gt;
The last of these criteria is called the &lt;a href=&quot;http://en.wikipedia.org/wiki/Triangle_inequality&quot;&gt;Triangle Inequality&lt;/a&gt;. The Triangle Inequality states that the path from x to z must be no longer than any path that goes through another intermediate point (the path from x to y to z). Look at a triangle, for example: it&#039;s not possible to draw a triangle such that it&#039;s quicker to get from one point to another by going along two sides than it is by going along the other side.&lt;br /&gt;
&lt;br /&gt;
These three criteria, basic as they are, are all that&#039;s required for something such as the Levenshtein Distance to qualify as a Metric Space. Note that this is far more general than, for example, a Euclidian Space - a Euclidian Space is metric, but many Metric Spaces (such as the Levenshtein Distance) are not Euclidian. Now that we know that the Levenshtein Distance (and other similar string distance functions) embodies a Metric Space, we come to the key observation of Burkhard and Keller.&lt;br /&gt;
&lt;br /&gt;
Assume for a moment we have two parameters, &lt;i&gt;query&lt;/i&gt;, the string we are using in our search, and &lt;i&gt;n&lt;/i&gt; the maximum distance a string can be from &lt;i&gt;query&lt;/i&gt; and still be returned. Say we take an arbitary string, &lt;i&gt;test&lt;/i&gt; and compare it to &lt;i&gt;query&lt;/i&gt;. Call the resultant distance &lt;i&gt;d&lt;/i&gt;. Because we know the triangle inequality holds, all our results must have at most distance &lt;i&gt;d+n&lt;/i&gt; and at least distance &lt;i&gt;d-n&lt;/i&gt; from &lt;i&gt;test&lt;/i&gt;.&lt;br /&gt;
&lt;br /&gt;
From here, the construction of a BK-Tree is simple: Each node has a arbitrary number of children, and each edge has a number corresponding to a Levenshtein distance. All the subnodes on the edge numbered n have a Levenshtein distance of exactly n to the parent node. So, for example, if we have a tree with parent node &quot;book&quot; and two child nodes &quot;rook&quot; and &quot;nooks&quot;, the edge from &quot;book&quot; to &quot;rook&quot; is numbered 1, and the edge from &quot;book&quot; to &quot;nooks&quot; is numbered 2. &lt;br /&gt;
&lt;br /&gt;
To build the tree from a dictionary, take an arbitrary word and make it the root of your tree. Whenever you want to insert a word, take the Levenshtein distance between your word and the root of the tree, and find the edge with number d(newword,root). Recurse, comparing your query with the child node on that edge, and so on, until there is no child node, at which point you create a new child node and store your new word there. For example, to insert &quot;boon&quot; into the example tree above, we would examine the root, find that d(&quot;book&quot;, &quot;boon&quot;) = 1, and so examine the child on the edge numbered 1, which is the word &quot;rook&quot;. We would then calculate the distance d(&quot;rook&quot;, &quot;boon&quot;), which is 2, and so insert the new word under &quot;rook&quot;, with an edge numbered 2. &lt;br /&gt;
&lt;br /&gt;
To query the tree, take the Levenshtein distance from your term to the root, and recursively query every child node numbered between d-n and d+n (inclusive). If the node you are examining is within d of your search term, return it and continue your query. &lt;br /&gt;
&lt;br /&gt;
The tree is N-ary and irregular (but generally well-balanced). Tests show that searching with a distance of 1 queries no more than 5-8% of the tree, and searching with two errors queries no more than 17-25% of the tree - a substantial improvement over checking every node! Note that exact searching can also be performed fairly efficiently by simply setting n to 0.&lt;br /&gt;
&lt;br /&gt;
Looking back on this, the post is rather longer and seems more involved than I had anticipated. Hopefully, you will agree after reading it that the insight behind BK-Trees is indeed elegant and remarkably simple. 
    </content:encoded>

    <pubDate>Mon, 02 Apr 2007 03:52:30 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/30-guid.html</guid>
    
</item>
<item>
    <title>Serializing JavaScript objects with circular references</title>
    <link>http://blog.notdot.net/archives/29-Serializing-JavaScript-objects-with-circular-references.html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/29-Serializing-JavaScript-objects-with-circular-references.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=29</wfw:comment>

    <slash:comments>-1</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=29</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    One problem we&#039;ve been dealing with at work for a while is serializing JavaScript objects to a string representation, when objects can contain circular references. The simplest example of this is an object a, which contains one property (call it &#039;foo&#039;) which is a reference to a.&lt;br /&gt;
&lt;br /&gt;
Mozilla provides a toSource() function on the Object class (and hence every object) which returns a serialized representation of that object, allowing it to be reconstituted with eval(). To handle circular references, any object or array can be tagged with a numeric identifier by prefixing it with &quot;#x=&quot; (where x is an integer), and a tagged object can be referenced with the syntax &quot;&amp;x;&quot;. Using this, any object graph can be serialized, and we can even make use of this system to compress the serialized text down further than Mozilla does on its own by using this syntax for all backreferences, instead of just circular ones. The example earlier, of an object referencing itself would serialize as &quot;#1={foo:&amp;1;}&quot;.&lt;br /&gt;
&lt;br /&gt;
All fine and dandy, but unfortunately, IE doesn&#039;t implement the toSource() function. I don&#039;t know about other browsers either - toSource() is part of Netscape&#039;s original Javascript documentation, but not part of the ECMAScript standard. So where does that leave us? We could use this syntax or something similar and simply do both serialization and deserialization manually in IE, but this leads to some fairly serious performance issues. We need something that&#039;s far less manual, but works in any JS-supporting browser.&lt;br /&gt;
&lt;br /&gt;
Our first approach was simply to replace the #x syntax with standard variables, in the form of a special context array. If we name this array &quot;_&quot;, and initialize it before deserializing, our serialized object now looks like this: &quot;_[1]={foo:_[1]}&quot;. Unfortunately, JavaScript doesn&#039;t assign to _[1] until it&#039;s finished deserializing the value that&#039;s being assigned to it (naturally), and so any uses of _[1] inside that value return &#039;undefined&#039;. Backreferences that aren&#039;t circular are just fine.&lt;br /&gt;
&lt;br /&gt;
Eventually, we concluded that an entirely automatic deserialization that works in all JS-supporting browsers simply isn&#039;t possible. One that requires a minimum of manual intervention is, however: We continue using the array syntax above for assignments, but we replace all backrefs with placeholder objects. After the eval() step, we recurse through the graph finding all the placeholders and replacing them with the appropriate backrefs from the array. The best part of this is that our array of backrefs has been conveniently populated for us by the JavaScript that we used to identify the destinations of the backrefs in the first place. Thus, our example from above becomes: &quot;_[1]={foo:{_ref:1}}&quot;. After initializing &quot;_&quot; and calling eval() on this, we recurse through the tree, find the object &quot;{_ref:1}&quot; and replace it with the value of _[1]. Done! We don&#039;t even have to keep track of where we&#039;ve been to avoid loops, because we just stop recursing whenever we replace a backref. 
    </content:encoded>

    <pubDate>Thu, 21 Sep 2006 09:51:05 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/29-guid.html</guid>
    
</item>
<item>
    <title>Indexing directory structures</title>
    <link>http://blog.notdot.net/archives/26-Indexing-directory-structures.html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/26-Indexing-directory-structures.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=26</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=26</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    After much messing around with SQL DBs and table structures, I eventually resorted to writing my own indexing scheme and storing the index in memory for my FS-indexing needs.&lt;br /&gt;
Here&#039;s how I went about it:&lt;br /&gt;
&lt;br /&gt;
1) Construct a tree out of the FS you&#039;re indexing. Each node in the tree needs a reference to its parent, but if you&#039;re just using it for this index, it doesn&#039;t actually need references to its children.&lt;br /&gt;
2) Construct a dictionary of lists of nodes to act as the index. Dictionary&lt;string,List&lt;Node&gt;&gt; in generics/templates speak.&lt;br /&gt;
3) Iterate through each item, and extract terms for that component. Terms are only extracted for the current component, not its parents - &quot;c:\foo\bar&quot; would only have &#039;bar&#039; as a term.&lt;br /&gt;
4) Find the relevant item in the index for each term (or add it, if need be), and add the current node to the list of nodes against that item.&lt;br /&gt;
&lt;br /&gt;
Once you&#039;ve constructed the index, searching is as follows:&lt;br /&gt;
1) Extract the terms for your search query in the same manner as used for path components above&lt;br /&gt;
2) Obtain the list from the index for each of these terms. If any of the lists are empty, return immediately with an empty set - no results are possible.&lt;br /&gt;
3) For each item in the returned lists, follow the chain of parents, checking which terms from the original search are present in the item and all its ancestors. If you match all terms, add it to the result set. If you reach the root of the tree without doing so, discard the node.&lt;br /&gt;
&lt;br /&gt;
I can&#039;t help thinking it could be more efficient - there&#039;s got to be a way to intersect the sets returned for each term, or to make use of the property that the end results for a search can consist only of items in each term&#039;s list, or their descendents, but I can&#039;t think how to do it right now.&lt;br /&gt;
&lt;br /&gt;
Performance tests: Indexing over 600,000 items, searches for 4 terms return in less than 1/100 of a second. Searches with more terms show only linear increase in time taken, of course. 
    </content:encoded>

    <pubDate>Wed, 30 Aug 2006 01:00:32 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/26-guid.html</guid>
    
</item>
<item>
    <title>Obsession</title>
    <link>http://blog.notdot.net/archives/25-Obsession.html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/25-Obsession.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=25</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=25</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    For some time now, I&#039;ve been working, on and off, on a P2P filesharing system, which I&#039;ve dubbed &#039;Attercop&#039; (named after the term used for &#039;spider&#039; used in Old English and LOTR). Simply put, it&#039;s a replacement for DC++, designed mostly for LAN environments. It uses multicast, so no central hub is required, and it improves on a number of the most glaring weaknesses in DC++. I&#039;m writing it in C#.NET.&lt;br /&gt;
&lt;br /&gt;
Like most projects I embark on, I started off hugely enthusiastic, spending as much time as I could spare on it. Normally, I&#039;d continue like that until either I get the project completed, or I burn-out on it. Quite frequently, the latter has happened - I&#039;ve burned out and lost interest, with the project unfinished.&lt;br /&gt;
&lt;br /&gt;
However, since I&#039;ve been spending all the time I&#039;ve not been at work with Hayley, I&#039;ve had very little time for it lately - a lack that&#039;s hard to mourn, since I consider time with Hayley much better spent - and I&#039;d begun to lose enthusiasm for the project. I concluded that I was (unfortunately) suffering from the same sort of mid-project disinterest I often get. However, we spent the last weekend at a Lan, during which I coded nearly the whole time, and I find my enthusiasm suddenly renewed. I&#039;m really enthused about it again, and hugely enjoy working on it.&lt;br /&gt;
&lt;br /&gt;
My conclusion, obvious as it seems: Limiting how much time you spend on a project - no matter how enthused you are by it - can help prevent burning-out on it. It just takes something really good to pull me away from something I&#039;m this engrossed in. 
    </content:encoded>

    <pubDate>Wed, 23 Aug 2006 05:58:58 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/25-guid.html</guid>
    
</item>
<item>
    <title>Transforming ATOM 1.0 into HTML using XSLT</title>
    <link>http://blog.notdot.net/archives/12-Transforming-ATOM-1.0-into-HTML-using-XSLT.html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/12-Transforming-ATOM-1.0-into-HTML-using-XSLT.html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=12</wfw:comment>

    <slash:comments>2</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=12</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    In response to Myke&#039;s request for a way to generate an HTML page containing the newest items from a bunch of RSS feeds, I made a suggestion: Use Google Reader to aggregate the feeds for you, export them using the &#039;sharing&#039; functionality, and then simply transform the resulting ATOM feed into an HTML page using some XSLT on serverside.&lt;br /&gt;
&lt;br /&gt;
I thought there&#039;d be a ready-made XSLT stylesheet for transforming ATOM out there, but I had absolutely no luck finding one. So, I hacked together a basic one for the purpose. Here it is:&lt;br /&gt;
&lt;div class=&quot;bb-code-title&quot;&gt;CODE:&lt;/div&gt;&lt;div class=&quot;bb-code&quot;&gt;&amp;#60;?xml&amp;#160;version=&quot;1.0&quot;&amp;#160;encoding=&quot;UTF-8&quot;?&amp;#62;&lt;br /&gt;
&amp;#60;xsl&amp;#58;stylesheet&amp;#160;version=&quot;1.0&quot;&amp;#160;&lt;br /&gt;
&amp;#160;&amp;#160;xmlns&amp;#58;xsl=&quot;http&amp;#58;//www.w3.org/1999/XSL/Transform&quot;&lt;br /&gt;
&amp;#160;&amp;#160;xmlns&amp;#58;atom=&quot;http&amp;#58;//www.w3.org/2005/Atom&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;template&amp;#160;match=&quot;/&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;html&amp;#160;xmlns=&quot;http&amp;#58;//www.w3.org/1999/xhtml&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;head&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;title&amp;#62;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;feed/atom&amp;#58;title&quot;/&amp;#62;&amp;#60;/title&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/head&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;body&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;h1&amp;#62;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;feed/atom&amp;#58;title&quot;/&amp;#62;&amp;#60;/h1&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;ul&amp;#62;&lt;br /&gt;
&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;for-each&amp;#160;select=&quot;atom&amp;#58;feed/atom&amp;#58;entry&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;li&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;h2&amp;#62;&amp;#60;xsl&amp;#58;element&amp;#160;name=&quot;a&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;attribute&amp;#160;name=&quot;href&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;link&amp;#91;@rel=&#039;alternate&#039;&amp;#93;/@href&quot;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;attribute&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;title&quot;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;element&amp;#62;&amp;#60;/h2&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;if&amp;#160;test=&quot;atom&amp;#58;published&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;div&amp;#160;class=&quot;published&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;Published&amp;#58;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;published&quot;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/div&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;if&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;if&amp;#160;test=&quot;atom&amp;#58;updated&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;div&amp;#160;class=&quot;updated&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;Updated&amp;#58;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;updated&quot;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/div&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;if&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;div&amp;#160;class=&quot;entrydesc&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;if&amp;#160;test=&quot;atom&amp;#58;summary&amp;#91;@type=&#039;html&#039;&amp;#93;&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;summary&quot;&amp;#160;disable-output-escaping=&quot;yes&quot;&amp;#160;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;if&amp;#62;&lt;br /&gt;
&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;if&amp;#160;test=&quot;atom&amp;#58;summary&amp;#91;@type=&#039;text&#039;&amp;#93;&quot;&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;xsl&amp;#58;value-of&amp;#160;select=&quot;atom&amp;#58;summary&quot;&amp;#160;disable-output-escaping=&quot;no&quot;&amp;#160;/&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;if&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/div&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/li&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;for-each&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/ul&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/body&amp;#62;&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#60;/html&amp;#62;&lt;br /&gt;
&lt;br /&gt;
&amp;#160;&amp;#160;&amp;#60;/xsl&amp;#58;template&amp;#62;&lt;br /&gt;
&amp;#60;/xsl&amp;#58;stylesheet&amp;#62;&lt;/div&gt;&lt;br /&gt;
I&#039;ve only tested it with google&#039;s atom, but it ought to work with any valid atom feed. I also haven&#039;t attempted to format the dates into something more user-friendly - I&#039;m sure XSLT has the facility, but I&#039;ll be damned if I can find it right now. Finally, only the title, url, dates and summary are exposed, but it&#039;s pretty trivial to add more. 
    </content:encoded>

    <pubDate>Fri, 21 Apr 2006 04:52:17 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/12-guid.html</guid>
    
</item>
<item>
    <title>Optimum phone keypad layouts, or, what to do with a boring weekend. </title>
    <link>http://blog.notdot.net/archives/11-Optimum-phone-keypad-layouts,-or,-what-to-do-with-a-boring-weekend..html</link>
            <category>coding</category>
    
    <comments>http://blog.notdot.net/archives/11-Optimum-phone-keypad-layouts,-or,-what-to-do-with-a-boring-weekend..html#comments</comments>
    <wfw:comment>http://blog.notdot.net/wfwcomment.php?cid=11</wfw:comment>

    <slash:comments>-4</slash:comments>
    <wfw:commentRss>http://blog.notdot.net/rss.php?version=2.0&amp;type=comments&amp;cid=11</wfw:commentRss>
    

    <author>nospam@example.com (Arachnid)</author>
    <content:encoded>
    So, I was a little bored, and had an idea. The commonly used (alphabetic) phone keypad layout is not particularly efficient when it comes to T9 (one digit per letter) style entry - there are a lot of collisions. What would an ideal keypad layout look like?&lt;br /&gt;
&lt;br /&gt;
So I hunted up a computer-readable english dictionary with frequency scores (a 0-16 scale), and devised a way to score the efficiency of a layout: group words by the digits they translate into with that layout, and multiply together their frequency scores. So, for example, two colliding words with frequency 8 would score 64 points, while a collision between a word with frequency 1 and one with frequency 15 would score only 15 points. The idea is that collisions between words with greatly different frequencies are much easier to resolve - it&#039;ll almost always be the more common word - whilst collisions between similar words are generally bad. Then, sum up all the products to get the final score.&lt;br /&gt;
&lt;br /&gt;
I then seeded a Genetic Algorithm with the default keypad layout, and ran generations of randomly mutating layouts (by reassigning letters at random) then scoring them, then selecting the few layouts with the lowest scores, and repeating.&lt;br /&gt;
&lt;br /&gt;
The default keypad layout scores a whopping 8,853,348 points.&lt;br /&gt;
The most optimal layout I could find scores only 71,063 points. The layout was:&lt;br /&gt;
&lt;br /&gt;
{gr&#039;,aky,cej,lw-,fimxz,hn,ps,qtu,bdov}&lt;br /&gt;
&lt;br /&gt;
Various variations on the placing of the apostrophe and hyphen also had the same score.&lt;br /&gt;
Each group represents letters to go on one key - order doesn&#039;t matter for the purpose of this.&lt;br /&gt;
&lt;br /&gt;
This was a 40th generation result of the original (generation 0) default layout. I have a complete &#039;geneaology&#039;, of course. This was the best run of several I did totally from scratch - the best results seem to occur when I select a reasonably large &#039;breeding population&#039;, not just the few very fittest ones - the geneaology for the winner has several steps where the parent scored better than the child.&lt;br /&gt;
&lt;br /&gt;
I just found it interesting how incredibly inefficient the default layout is when it comes to collisions. A pity nobody will ever use anything else.&lt;br /&gt;
&lt;br /&gt;
The next step, of course, would be to construct another GA based on the optimum keypad layout for this key mapping, so as to minimise finger movement between letters when typing a large corpus. 
    </content:encoded>

    <pubDate>Tue, 18 Apr 2006 19:32:08 +0100</pubDate>
    <guid isPermaLink="false">http://blog.notdot.net/archives/11-guid.html</guid>
    
</item>

</channel>
</rss>