Nick's Blogbecause repeating myself sucks2014-10-23T09:11:00Ztag:blog.notdot.net,2014-10-23:atom.xmlCopyright (c) 2013
Bloggart 1.0
Introducing Arachnid Labs and the Lokitag:blog.notdot.net,2013-03-04:post:1110012013-03-04T14:39:45Z2013-01-23T19:45:09ZNick Johnsonhttp://blog.notdot.net/
<p>As many of you will have guessed from the predominance of electronics related posting recently, I've been working a lot with digital electronics of late. I recently made the decision to get serious about it, pick a trading name, and start working on projects that others can enjoy. The end result? Meet Arachnid Labs:</p>
<p><a href="http://www.arachnidlabs.com/"><img src="http://www.arachnidlabs.com/images/arachnidlabs-notext.svg"></a></p>
<p>Arachnid Labs is the name I'll be releasing future projects under, including some that will be available to buy on <a href="https://www.tindie.com/shops/ArachnidLabs/">Tindie</a>. The first of these is a very ambitious project I'm calling Loki.</p>
<p><img src="http://www.arachnidlabs.com/images/loki.svg" style="width: 100px;"> Loki is a microcontroller development board designed around Cypress's PSoC processor. It's extremely flexible and powerful - a big step beyond a typical Arduino style board. I'm really excited about it, and I'm working really hard to get it - and an initial set of expansion boards - ready and tested.</p>
<p><img src="https://lh5.googleusercontent.com/-yPN_q7JmINg/UQLmIGT-7lI/AAAAAAAACbE/xb4uWjqj3lM/w396-h298-n-k/P1010194.JPG"></p>
<p>I'll be blogging about my progress with Loki and other projects regularly on <a href="http://www.arachnidlabs.com/">the Arachnid Labs blog</a>, so please check it out and subscribe if hearing about that appeals.</p>
<p>I've also set up accounts for Arachnid Labs on <a href="http://twitter.com/ArachnidLabs">Twitter</a> and <a href="https://plus.google.com/102498818006624473472">Google+</a>, which you can follow for news and updates.</p>
7400 Competition Winners Announcedtag:blog.notdot.net,2012-11-09:post:1100012012-11-09T22:08:31Z2012-11-09T21:14:46ZNick Johnsonhttp://blog.notdot.net/
<p>The winners are out! See <a href="http://dangerousprototypes.com/2012/11/09/open-7400-logic-competition-winners-2012/">the blog post</a> for details.</p>
<p>My own entry, the <a href="http://blog.notdot.net/2012/10/Build-your-own-FPGA">Discrete FPGA</a> is one of the 15(!) first-place winners!</p>
7400 Competition Reader's Choicetag:blog.notdot.net,2012-11-07:post:1090012012-11-07T07:42:14Z2012-11-07T07:42:14ZNick Johnsonhttp://blog.notdot.net/
<p>The 7400 competition has put up a <a href="http://dangerousprototypes.com/2012/11/07/7400-competition-entries-and-readers-choice-2/">Reader's Choice post</a>, soliciting votes for readers' favorite submissions. It's also a great summary of all the excellent submissions this year.</p>
<p>Go check it out! Of course, if you wanted to leave a comment voting for <a href="http://blog.notdot.net/2012/10/Build-your-own-FPGA">my DFPGA project</a>, I certainly wouldn't complain...</p>
Build your own FPGAtag:blog.notdot.net,2012-11-02:post:1080012012-11-02T15:57:34Z2012-10-31T20:09:40ZNick Johnsonhttp://blog.notdot.net/
<p>The <a href="http://dangerousprototypes.com/open-7400-logic-competition/">Open 7400 Logic Competition</a> is a crowd-sourced contest with a simple but broad criteria for entry: build something interesting out of discrete logic chips. It's now in its second year, and this time around I was inspired to enter it.</p>
<p>Discrete logic, for anyone who isn't familiar, are any of a number of families of ICs who each perform a single, usually fairly straightforward, function. Typical discrete logic ICs include basic logic gates like AND, OR and NAND, Flip-Flops, shift registers, and multiplexers. For smaller components like gates and flipflops, a single IC will usually contain several independent ones. As you can imagine, building anything complex out of discrete logic involves using a <b>lot</b> of parts; these days they're typically used as 'glue' logic rather than as first-class components, having been largely supplanted by a combination of specialised devices, microcontrollers, and FPGAs.</p>
<p>Building a microcontroller or CPU out of discrete logic is a popular hobbyist pursuit, and it serves a useful purpose: building a CPU from scratch teaches you a lot about CPU architecture and tradeoffs; it's an interesting and instructive exercise. So, I wondered, wouldn't building an FPGA out of discrete logic be similarly educational? Hence, my competition entry: an FPGA (or rather, a 'slice' of one) built entirely out of discrete logic chips.</p>
<p><a href="https://lh3.googleusercontent.com/-G0SPfnTtFT0/UIBjBT7gdtI/AAAAAAAAB2s/ZlYg2GZ1n84/s702/IMG_20121018_211334.jpg"><img src="https://lh6.googleusercontent.com/-G0SPfnTtFT0/UIBjBT7gdtI/AAAAAAAAB2s/ZlYg2GZ1n84/w214-h285-n-k/IMG_20121018_211334.jpg" /></a></p>
<h3>Designing an FPGA from 7400s</h3>
<p>The most basic building block of an FPGA is the Cell, or Slice. Typically, a slice has a few inputs, a Lookup Table (or LUT) which can be programmed to evaluate any boolean function over those inputs, and one or more outputs, each of which can be configured to either update immediately when the input updates (asynchronous) or update only on the next clock tick, using a flipflop built into the slice (synchronous). Some FPGA cells have additional capabilities, such as adders implemented in hardware, to save using LUTs for this purpose.</p>
<p>The core of a slice, the Lookup Table, seems nearly magic - taking an array of inputs, it can be programmed to evaluate any boolean function on them and output the result. As the name implies, though, the implementation is very simple, and it's a technique also used to implement microcode and other configurable glue logic. In principle, what you do is this: take a memory IC such as some SRAM or an EEPROM. Wire up the address lines to your inputs, and the data lines to your output. Now, any combination of input states will be interpreted as an address, which the memory will look up and provide on the data outputs. By programming the memory with the state tables for the functions you want to compute, you can configure it to evaluate anything you like.</p>
<p>Unfortunately, none of the 7400 series memories are manufactured anymore, and while there are plenty of SRAMs and EEPROMs available, the smallest sizes available are significantly larger than what we want for a simple discrete FPGA. Further, in order to be able to both program and read the memory, we'd need a lot of logic to switch between writing to the memory and reading from it (on a 'single port' memory, these use the same pins).</p>
<p>However, a simple solution presents itself: shift registers! A shift register is effectively an 8-bit memory, with serial inputs - convenient for our purposes - and each bit exposed on its own pin. By combining this with an 8-way multiplexer, we have a basic 3-input 1-output LUT. Our LUT can be reprogrammed using the data, clock, and latch lines, and many of them can be chained together and programmed in series. The 3 select inputs on the 8-way mux form the inputs to the LUT, and the mux's output bit is the output. So, in two readily available 7400 series ICs, we have one complete Lookup Table.</p>
<p><img src="https://lh3.googleusercontent.com/-xS2n6wHkNXA/UJDWq1Y0rHI/AAAAAAAAB80/YVUyfo8tX_8/s517/lut.png" /></p>
<p>For our FPGA slice, we'll use two of these discrete LUTs, with their inputs ganged together. Why two? Because a combined capability of 3 inputs and 2 outputs about the smallest you can implement interesting things with. 3 inputs and 2 outputs lets you build a full adder in a single slice; any fewer inputs or outputs and just adding two 1-bit numbers together with carry requires multiple slices, which severely limits our capabilities.</p>
<p>The next component is the flipflops, and the logic for selecting asynchronous or synchronous mode. There's a profusion of flipflops and registers available, from 2 up to 8 in a single IC, and with various control methods, so that's no problem. Choosing between synchronous and asynchronous is a little tougher. The natural choice here is a 2-way multiplexer, but while chips with multiple 2-way multiplexers exist, they all gang the select lines together, meaning you have to choose the same input for all the multiplexers in a chip. Obviously, this isn't really suitable for our application.</p>
<p>Fortunately, a 2-way multiplexer isn't difficult to construct. There are several options, but the most efficient is to use tristate buffers. There are a couple in the 7400 range - the 74*125 and 74*126 that meet our requirements ideally. Each contains four tri-state buffers, the only difference between the two chips being that one enables its output when the enable line is high, while the other enables its output when it is low. By ganging these together in pairs, we can create multiplexers; one of each IC gets us four independent multiplexers. Two multiplexers, plus our register IC gets us our sync/async select logic. Of course, we need a way to control the multiplexers, so chain in another shift register to provide some state to program them with.</p>
<p><img src="https://lh5.googleusercontent.com/-IdiXOnfbW-k/UJDW1wUFsLI/AAAAAAAAB9A/IA4_xVm_5y8/s903/slice-core.png" /></p>
<p>Now we've got the core of a basic slice designed, let's look at the second major component of any FPGA: routing. Flexible routing is a key attribute of any useful FPGA; without good routing, you can't get signals where they need to go and you waste precious resources, making your FPGA a lot less useful. Routing, though, uses a huge amount of resources to implement properly. What's the minimum we can provide and still get a useful and interesting result?</p>
<p>Typically, FPGAs position individual slices in a rectangular grid. Buses run between slices in the grid both horizontally and vertically. A slice is able to tap into some subset of the lines at its intersection, and can likewise output to some subset of the lines. Typically, the bus can continue through a slice uninterrupted, or the bus can be 'broken', effectively creating separate buses on either side of the slice. In some cases, buses can also be connected together in other ways, routing between different bus lines or between horizontal and vertical buses without the direct involvement of the slice.</p>
<p>One bit buses are a bit too narrow even for our purposes; a lot of interesting applications are going to require more than that, so let's see what we can make of 2 bit buses, both vertical and horizontal. Many FPGAs include a built in bias in one direction or another; this saves routing resources by favoring more common uses at the expense of making less common setups more expensive. In our case, we'll make it easier to read from the 'left' and 'top' buses, and easier to write to the 'right' and 'bottom' buses. We can do this by having 2-input multiplexers on each of the left, top and right buses; these multiplexers feed into our LUT's 3 inputs. For output, we can use more tristate buffers to allow one LUT to output to either or both of the right bus lines, while the other outputs to either or both of the bottom bus lines. To read from the bottom, or to drive the left or top lines, one simply has to drive the opposite side, and close the appropriate bus switch.</p>
<p>Speaking of bus switches, we'll go for the simplest configuration: a switch connecting each of the top and bottom lines, and a switch connecting each of the left and right lines, which can be opened or closed individually. The 74*4066 "quad bilateral switch" IC provides a convenient way to do this in a single IC. All of our routing requires state, of course - 3 bits for the input multiplexers, 4 bits for the output enables, and 4 more bits for the bus switches - so we'll use another shift register, and some of the spare bits from the one we added for sync/async selection.</p>
<p><img src="https://lh4.googleusercontent.com/-Zl4Ma8V26xU/UJDW1PQCwxI/AAAAAAAAB88/DAUcFAz2afs/s764/routing.png"></p>
<p>With routing done, we've more or less designed the entire of a basic FPGA slice in discrete logic. Let's take inventory:</p>
<ul>
<li>4 x 74HC595 Shift Registers, for LUTs and routing/multiplexer state</li>
<li>2 x 74HC251 8-line multiplexer, for LUTs</li>
<li>2 x 74HC125 and 2 x 74HC126 Tristate buffers, for multiplexers and output enables.</li>
<li>1 x 74HC173 4-bit register, for synchronous operation.</li>
<li>1 x 74HC4066 Quad Bilateral Switch, for bus switches.</li>
</ul>
<p>That's a total of 12 discrete logic ICs to implement one moderately capable FPGA slice. Add a few LEDs to give a visual indicator of the status of the bus lines, and some edge connectors to hook them up together, and we have a board that can be ganged together in a rectangular configuration to make a modular, expandable discrete logic FPGA. Pointless, given that it's a fraction of the capability of a moderately priced FPGA or CPLD chip? Probably. Cool? Most definitely.</p>
<h3>Programming</h3>
<p>Of course, it's no good having a DFPGA if there's no way to program it. We could figure out the bitmasks to achieve what we want ourselves, but that's tedious and error prone. Porting VHDL or Verilog to something like this would be tough, and massive overkill given the number of slices we're dealing with. Instead, I opted to implement a simple hardware description language, which I'll call DHDL.</p>
<p>DHDL doesn't attempt to handle layout or optimisation; instead it implements a fairly straightforward compiler to take logic expressions and turn them into slice configuration data. A DHDL file consists of a set of slice definitions, followed by a list of slices to 'invoke', arranged in the same manner as the DFPGA is laid out. Here's an example of a DHDL definition for a 'ripple carry full adder' slice:</p>
<pre>slice adder {
l0 ^ r1 ^ u0 -> r0;
(l0 & r1) | (l0 & u0) | (r1 & u0) -> d0;
}</pre>
<p>Here, l0, r1, etc, refer to bus lines - 'u', 'd', 'l' and 'r' for up, down, left, and right. The two addends are provided on l0 and l1; since the bus switches are closed, they're also available on r0 and r1, which the adder takes advantage of, since we can only select from one left bus line at a time. Carry input enters via the bus line u0. The first expression computes the sum of the two inputs and the carry, outputting it on r0. The second expression computes the carry output, which is transmitted to the next slice down via d0.</p>
<p>DHDL takes care of bus switch configuration for us here: by default, all bus switches are closed (that is, they conduct), but when we output to a bus line, the corresponding bus switch defaults to open. In this situation, that's the correct behaviour, since it allows us to read one of the addends on l0 and output the result on r0; it also ensures we separate the incoming and outgoing carry signals.</p>
<p>In some cases, we might want to configure the buses ourselves. We can use the expression `a </> b` to specify that a bus switch should be open, and the expression `a <-> b` to specify that it should be closed. Here's an example of a storage element that utilizes that:</p>
<pre>slice storage {
(u0 & r1) | (!u0 & l0) sync -> r0;
l0 <-> r0;
}</pre>
<p>This slice uses feedback to store a value, by outputting it on r0 and reading it back from l0. Since outputting to r0 would normally cause the compiler to open the switch between l0 and r0, we explicitly tell it that we want the switch closed, making the feedback possible. This definition also demonstrates how we specify synchronous vs asynchronous behaviour, with the `sync` or `async` keyword before the assignment operator. The default is asynchronous. Thus, this slice will output the stored value on r0 and l0; on the leading edge of a clock cycle where u0 is high, it will store the value of l1/r1 as the new value. Also note that since we're not outputting to d0, the switch between u0 and d0 is closed, meaning we could stack many of these vertically and control them all with an enable input. We've effectively created a flipflop slice.</p>
<p>Let's see what a complete FPGA definition looks like. Here's one for a 4-bit combination lock:</p>
<pre>slice storage {
(u0 & r1) | (!u0 & l0) sync -> r0;
l0 <-> r0;
}
slice compare_carry {
!(l0 ^ r1) & u0 -> d0;
}
slice compare {
!(l0 ^ r1) -> d0;
u0 </> d0;
}
storage compare,
storage compare_carry,
storage compare_carry,
storage compare_carry</pre>
<p>First we define some slices - the storage slice we already saw, and a comparer, which outputs a 1 to d0 iff both the horizontal bus lines are equal and its u0 input was 1. We also define a version of the comparer without a carry, since the topmost slice will not have a carry input.</p>
<p>Operation is like this: To set the code, input the values on the l1 input of each of the leftmost slices, then take the top slice's u0 input high for one clock cycle. To test a combination, input the values on the l1 inputs again, but leave the top slice's u0 input low. The bottom right slice's d0 line indicates if the combination is correct.</p>
<p>Finally, let's try something a little bit more involved: a PWM controller. We'll need a counter, some comparators, and a set/reset circuit:</p>
<pre>slice toggler {
!r1 sync -> r1;
r1 -> d0;
}
slice counter {
r1 ^ u0 sync -> r1;
r1 & u0 -> d0;
}
slice compare {
!(l0 ^ r1) -> d0;
}
slice compare_carry {
!(l0 ^ r1) & u0 -> d0;
}
slice overflow_pass {
u0 -> r0;
}
slice srlatch {
(r0 | u0) & !l0 sync -> r0;
}
toggler compare,
counter compare_carry,
counter compare_carry,
overflow_pass srlatch</pre>
<p>The first two slice definitions, toggler and counter, collectively implement a binary counter. Toggler is the least significant bit, while any number of counter stages can be chained vertically to make an n bit ripple-carry counter - in this case, we've constructed a 3 bit counter. compare and compare_carry should look familiar from the previous sketch; they implement a ripple-carry comparator, in this case comparing the output of the binary counter with the other bus line, which will be set with switches. overflow_pass's job is very simple - it passes the overflow signal from the counter to its right output, making both that and the comparator output available to the final slice, srlatch. As the name implies, this is a simple set/reset latch, with the counter overflow resetting it, and the comparator setting it.</p>
<p>By setting the 3 input bits to reflect the duty cycle required, and pulsing the clock line sufficiently fast, the srlatch slice's r0 output will be PWMed with the appropriate duty cycle - which can be visually observed as the LED on that bus line being dimmed.</p>
<h3>Fabrication</h3>
<p>Designing and building this board was an interesting exercise. Due to the number of ICs and wanting to make the PCB as compact as possible, this was by far the toughest board to route that I've designed so far. Since I had time constraints to get the board sent off for fabrication in time for the contest, I ended up using an autorouter for the first time. Eagle's autorouter is remarkably awful, but it turns out there's a much better free alternative, called <a href="http://www.freerouting.net/">Freerouting</a>. Freerouting is a Java based PCB router; it can import layouts from Eagle, KiCad and others, and exports scripts that can be executed to implement the final routing in your CAD tool. Where Eagle wanted to produce a board with over 150 vias, Freerouting was able to produce one that had fewer than 50, and visual inspection shows it to be fairly sane, too. It's not just for automatic routing, either - it has an excellent manual routing mode, where it'll allow you to nudge existing tracks around without having to rip them up and reroute them every time you need to fit another line in.</p>
<p>For fabrication, I went with the excellent <a href="http://tinyurl.com/hvpcbfaq">Hackvana</a>, who made me up 20 of the boards and had them to me in record time. A jumbo order of parts from Farnell/Element14 saw me sorted for parts, and all that was left was hours and hours of soldering - with over 200 SMT pads on each board, the assembly process takes a while.</p>
<p>Of course, as with any first iteration design, there were problems. A couple of minor design improvements occurred to me almost immediately, which would've increased the board's capabilities somewhat, and the jumpers that let you determine how the serial programming stream connects between boards could be better placed. More problematic, I accidentally tied all the shift registers' reset lines low, when they're actually active low - they should be connected to the 5v rail. After some experimentation, however, I came up with a greenwiring solution for this, which you can see in the photos below; it doesn't even add to the construction time by more than a couple of minutes per board. This bug is, of course, fixed in the schematics.</p>
<h3>Demonstration</h3>
<p>How does it look once assembled and working? Very cool. It may not be anywhere near as capable as a real FPGA, but it's also a lot easier to inspect and understand. With LEDs on all the bus lines you can see exactly what the internal state is at any time, which makes debugging a whole lot easier.</p>
<p>Here's one of the boards in the array fully constructed and hooked up; click the photo for more.</p>
<p><a href="https://plus.google.com/photos/116127381267973124425/albums/5804509415428735009"><img src="https://lh5.googleusercontent.com/-hQL4AhEri8w/UJDy8isa5AI/AAAAAAAAB_w/5hsj08XeGq4/w290-h217-n-k/IMG_20121030_213414.jpg" /></a></p>
<p>Of course, it wouldn't be complete without a video of the DFPGA in action...</p>
<iframe width="420" height="315" src="http://www.youtube.com/embed/7r0CuxFMGBQ" frameborder="0" allowfullscreen></iframe>
<h3>Source</h3>
<p>All the design files, along with the DHDL compiler, test suite, and demo definitions are open source under the Apache 2.0 license. You can find them all on Github <a href="https://github.com/arachnid/dfpga">here</a>. If you decide to build your own DFPGA, or find the schematics or code useful in your own project - let me know!</p>
<h3>Future developments</h3>
<p>Remember how I rubbished the use of dedicated memory chips at the beginning, saying that all the ones available now are too big, and too difficult to interface with? Well... that's not quite as accurate as I thought when I was designing things.</p>
<p>It's true that the memory you can get is mostly larger than we need, but that can be an advantage in moderation - it means it's possible to construct much more capable slices. How would you like a slice with 8 inputs and 4 outputs, that can output to any of the bus lines, and has 4 bits of internal state, allowing it to implement a 16 state state machine in each slice? And all with a little over half as many ICs as the design above? It turns out that with a few clever tricks, that ought to be possible - with one catch.</p>
<p>The catch is this: the smallest SRAMs available are 256 kilobit, which is really quite large - so much so that an embedded processor like an Arduino could never program even one of these slices without external memory. We can use EEPROMS instead, which tend to be a bit smaller and could be easily programmed ahead of time, but that still leaves us needing a way to store the other configuration bits, such as the output enables. EEPROM shift registers, unfortunately, don't really seem to exist.</p>
<p>With a little clever optimisation, though, a compact design that loads the output enable state from the EEPROM at power on is possible, albeit somewhat more complicated than the current design. Unfortunately, I suspect the demand for discrete logic FPGAs - even fairly capable ones - is low, so it's unlikely this design will ever see the light of day.</p>
<p>I could be wrong, though. Do you want your own discrete FPGA? Can you think of a practical use for one? Let me know in the comments!</p>
Penny for your thoughts?tag:blog.notdot.net,2012-09-24:post:1070012012-09-24T18:02:35Z2012-09-24T18:02:35ZNick Johnsonhttp://blog.notdot.net/
<p>I've been using my spare time lately to build something that's even more trivial and silly than usual. Behold, the advice machine!</p>
<p><img src="https://lh4.googleusercontent.com/-DN1SX-2fYZM/UGCclVQS-bI/AAAAAAAABw8/uG5w5eRXUvY/w351-h263-n-k/P1010126.JPG"></p>
<p>Operation is pretty straightforward. You insert whatever quantity of coins you see fit, then request some advice from the machine. The quality of the advice you get reflects the value of the coins you put in - more or less. More money typically results in more in-depth fortunes, and more interesting ones such as quotations, while a small amount gets you platitudes or terrible jokes.</p>
<p>Here's a video of it in action:</p>
<p><iframe width="560" height="315" src="http://www.youtube.com/embed/vKlPvEEjxMQ" frameborder="0" allowfullscreen></iframe></p>
<p>More photos can be found <a href="https://plus.google.com/photos/116127381267973124425/albums/5791801180487867841?authkey=CMqQ3JPsr8jPbg">here</a>.</p>
<p>All in all, the project was relatively straightforward, after a couple of false starts. The coin acceptor is a standard item commonly found on ebay - this one has model number CH-926 - for around about $20-$30. The device has a moderately complex setup procedure whereby you 'train' it on a set of sample coins, and tell it how many pulses to output for each coin. After that, it will recognize coins similar to the ones it was trained on, outputting the appropriate number of pulses on one of its pins when each is detected.</p>
<p>The LCD is a standard 16x2 character display, with an I2C backpack <a href="http://jeelabs.com/products/lcd-plug">from Jeelabs</a>. The thermal printer is the same <a href="https://www.sparkfun.com/products/10438">Sparkfun item</a> that has inspired so many (mostly twitter based) hacks.</p>
<p>The whole setup is controlled by a Raspberry Pi. A simple DIY breakout board separates out the various pins - GPIOs for the buttons and the coin acceptor, I2C for the display, and UART serial for the printer. A simple Python program scans the GPIOs for activity (using polling, unfortunately) and drives the display and printer.</p>
<p>Advice is sourced from the Unix fortunes databases, with certain databases corresponding to different amounts, with a bit of fuzz added for extra fun, and semi-random allocation of 'bonus' fortunes. I've categorized some of the databases by perceived value, so $0.05 is more likely to give you a platitude, while $1-$2 might get you a quote from the Tao. This is pretty rough, though, and all up I think it's the weakest part of the build. If anyone has ideas for a better source of advice, or a better way to categorize it, I'm all ears.</p>
<p>Now that it's built, I'm planning on offering it to the local Hackspace as a substitute donation box.</p>
<p>Many thanks to Gavin Smith of the Sydney Hackspace for the original idea.</p>
Damn Cool Algorithms: Cardinality Estimationtag:blog.notdot.net,2012-09-07:post:1060012012-09-07T17:09:24Z2012-09-07T13:37:41ZNick Johnsonhttp://blog.notdot.net/
<style type="text/css">
code { display: inline; padding: 5px 0 0 0}
sup { vertical-align: super }
sub { vertical-align: sub}
</style>
<p>Suppose you have a very large dataset - far too large to hold in memory - with duplicate entries. You want to know how many duplicate entries, but your data isn't sorted, and it's big enough that sorting and counting is impractical. How do you estimate how many <em>unique</em> entries the dataset contains? It's easy to see how this could be useful in many applications, such as query planning in a database: the best query plan can depend greatly on not just how many values there are in total, but also on how many <em>unique</em> values there are.</p>
<p>I'd encourage you to give this a bit of thought before reading onwards, because the algorithms we'll discuss today are quite innovative - and while simple, they're far from obvious.</p>
<h3>A simple and intuitive cardinality estimator</h3>
<p>Let's launch straight in with a simple example. Suppose someone generate a dataset with the following procedure:</p>
<ol>
<li>Generate <code>n</code> evenly distributed random numbers</li>
<li>Arbitrarily replicate some of those numbers an unspecified number of times</li>
<li>Shuffle the resulting set of numbers arbitrarily</li>
</ol>
<p>How can we estimate how many unique numbers there are in the resulting dataset? Knowing that the original set of numbers was random and evenly distributed, one very simple possibility occurs: simply find the smallest number in the set. If the maximum possible value is <code>m</code>, and the smallest value we find is <code>x</code>, we can then estimate there to be about <code>m/x</code> unique values in the total set. For instance, if we scan a dataset of numbers between 0 and 1, and find that the smallest value in the set is 0.01, it's reasonable to assume there are roughly 100 unique values in the set; any more and we would expect to see a smaller minimum value. Note that it doesn't matter how many times each value is repeated: it is the nature of aggregates like <code>min</code> that repetitions do not affect the output value.</p>
<p>This procedure has the advantage of being extremely straightforward, but it's also very inaccurate. It's not hard to imagine a set with only a few distinct values containing an unusually small number; likewise a set with many distinct values could have a smallest value that is larger than we expect. Finally, few datasets are so well behaved as to be neatly random and evenly distributed. Still, this proto-algorithm gives us some insight into one possible approach to get what we want; what we need is further refinements.</p>
<h3>Probabilistic counting</h3>
<p>The first set of refinements comes from the paper <a href="http://www.cse.unsw.edu.au/~cs9314/07s1/lectures/Lin_CS9314_References/fm85.pdf">Probabilistic Counting Algorithms for Data Base Applications</a> by Flajolet and Martin, with further refinements in the papers <a href="http://www.ic.unicamp.br/~celio/peer2peer/math/bitmap-algorithms/durand03loglog.pdf">LogLog counting of large cardinalities</a> by Durand-Flajolet, and <a href="http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf">HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm</a> by Flajolet et al. It's interesting to watch the development and improvement of the ideas from paper to paper, but I'm going to take a slightly different approach and demonstrate how to build and improve a solution from the ground up, omitting some of the algorithm from the original paper. Interested readers are advised to read through all three; they contain a lot of mathematical insights I won't go into in detail here.</p>
<p>First, Flajolet and Martin observe that given a good hash function, we can take any arbitrary set of data and turn it into one of the sort we need, with evenly distributed, (pseudo-)random values. With this simple insight, we can apply our earlier procedure to whatever data we want, but they're far from done.</p>
<p>Next, they observe that there are other patterns we can use to estimate the number of unique values, and some of them perform better than recording the minimum value of the hashed elements. The metric Flajolet and Martin pick is counting the number of 0 bits at the beginning of the hashed values. It's easy to see that in random data, a sequence of <code>k</code> zero bits will occur once in every <code>2<sup>k</sup></code> elements, on average; all we need to do is look for these sequences and record the length of the longest sequence to estimate the total number of unique elements. This still isn't a great estimator, though - at best it can give us a power of two estimate of the number of elements, and much like the min-value based estimate, it's going to have a huge variance. On the plus side, our estimate is very small: to record sequences of leading 0s of up to 32 bits, we only need a 5 bit number.</p>
<p>As a side note, the original Flajolet-Martin paper deviates here and uses a bitmap-based procedure to get a more accurate estimate from a single value. I won't go into this in detail, since it's soon obsoleted by improvements in subsequent papers; interested readers can read the original paper for more details.</p>
<p>So we now have a rather poor estimate of the number of values in the dataset based on bit patterns. How can we improve on it? One straightforward idea is to use multiple independent hash functions. If each hash produces its own set of random outputs, we can record the longest observed sequence of leading 0s from each; at the end we can average our values for a more accurate estimate.</p>
<p>This actually gives us a pretty good result statistically speaking, but hashing is expensive. A better approach is one known as <em>stochastic averaging</em>. Instead of using multiple hash functions, we use just a single hash function, but use part of its output to split values into one of many buckets. Supposing we want 1024 values, we can take the first 10 bits of the hash function as a bucket number, and use the remainder of the hash to count leading 0s. This loses us nothing in terms of accuracy, but saves us a lot of redundant computation of hashes.</p>
<p>Applying what we've learned so far, here's a simple implementation. This is equivalent to the LogLog algorithm in the Durand-Flajolet paper; for convenience and clarity, though, I'm counting trailing (least-significant) 0 bits rather than leading ones; the result is exactly equivalent.</p>
<pre class="prettyprint">def trailing_zeroes(num):
"""Counts the number of trailing 0 bits in num."""
if num == 0:
return 32 # Assumes 32 bit integer inputs!
p = 0
while (num >> p) & 1 == 0:
p += 1
return p
def estimate_cardinality(values, k):
"""Estimates the number of unique elements in the input set values.
Arguments:
values: An iterator of hashable elements to estimate the cardinality of.
k: The number of bits of hash to use as a bucket number; there will be 2**k buckets.
"""
num_buckets = 2 ** k
max_zeroes = [0] * num_buckets
for value in values:
h = hash(value)
bucket = h & (num_buckets - 1) # Mask out the k least significant bits as bucket ID
bucket_hash = h >> k
max_zeroes[bucket] = max(max_zeroes[bucket], trailing_zeroes(bucket_hash))
return 2 ** (float(sum(max_zeroes)) / num_buckets) * num_buckets * 0.79402</pre>
<p>This is all pretty much as we just described: we keep a bunch of counts of number of leading (or trailing) zeroes; at the end we average the counts; if our average is x, our estimate is 2<sup>x</sup>, multiplied by the number of buckets. Not mentioned previously is this magic number <code>0.79402</code>. Statistical analysis shows that our procedure introduces a predictable bias towards larger estimates; this magic constant is derived in the paper by Durand-Flajolet to correct that bias. The actual figure varies with the number of buckets used, but with larger numbers of buckets (at least 64), it converges on the estimate we use in the above algorithm. See the complete paper for <em>lots</em> more information, including the derivation of that number.</p>
<p>This procedure gives us a pretty good estimate - for m buckets, the average error is about <code>1.3/sqrt(m)</code>. Thus with 1024 buckets (for 1024 * 5 = 5120 bits, or 640 bytes), we can expect an average error of about 4%; 5 bits per bucket is enough to estimate cardinalities up to 2<sup>27</sup> per the paper). That's pretty good for less than a kilobyte of memory!</p>
<p>Let's try it ourselves on some random data:</p>
<pre class="prettyprint">>>> [100000/estimate_cardinality([random.random() for i in range(100000)], 10) for j in range(10)]
[0.9825616152548807, 0.9905752876839672, 0.979241749110407, 1.050662616357679, 0.937090578752079, 0.9878968276629505, 0.9812323203117748, 1.0456960262467019, 0.9415413413873975, 0.9608567203911741]</pre>
<p>Not bad! Some of the estimates are off by more than the predicted 4%, but all in all they're pretty good. If you're trying this experiment yourself, one caution: Python's builtin <code>hash()</code> hashes integers to themselves. As a result, running something like <code>estimate_cardinality(range(10000), 10)</code> will give wildly divergent results, because <code>hash()</code> isn't behaving like a good hash function should. Using random numbers as in the example above works just fine, however.</p>
<h3>Improving accuracy: SuperLogLog and HyperLogLog</h3>
<p>While we've got an estimate that's already pretty good, it's possible to get a lot better. Durand and Flajolet make the observation that outlying values do a lot to decrease the accuracy of the estimate; by throwing out the largest values before averaging, accuracy can be improved. Specifically, by throwing out the 30% of buckets with the largest values, and averaging only 70% of buckets with the smaller values, accuracy can be improved from <code>1.30/sqrt(m)</code> to only <code>1.05/sqrt(m)</code>! That means that our earlier example, with 640 bytes of state and an average error of 4% now has an average error of about 3.2%, with no additional increase in space required.</p>
<p>Finally, the major contribution of Flajolet et al in the HyperLogLog paper is to use a different type of averaging, taking the <em>harmonic mean</hm> instead of the <em>geometric mean</em> we just applied. By doing this, they're able to edge down the error to <code>1.04/sqrt(m)</code>, again with no increase in state required. The complete algorithm is somewhat more complicated, however, as it requires corrections for both small and large cardinalities. Interested readers should - you guessed it - read the entire paper for details.</p>
<h3>Parallelization</h3>
<p>One really neat attribute that all these schemes share is that they're really easy to parallelize. Multiple machines can independently run the algorithm with the same hash function and the same number of buckets; at the end results can be combined by taking the maximum value of each bucket from each instance of the algorithm. Not only is this trivial to do, but the resulting estimate is exactly identical to the result we'd get running it on a single machine, while we only needed to transfer less than a kilobyte of data per instance to achieve this.</p>
<h3>Conclusion</h3>
<p>Cardinality estimation algorithms like the ones we've just discussed make it possible to get a very good estimate - within a few percent - of the total number of unique values in a dataset, typically using less than a kilobyte of state. We can do this regardless of the nature of the data, and the work can be distributed over multiple machines with minimum coordination overhead and data transfer. The resulting estimates can be useful for a range of things, such as traffic monitoring (how many unique IPs is a host contacting?) and database query optimization (should we sort and merge, or construct a hashtable of unique values?).</p>
<p>Got an algorithm that you think is Damn Cool? Post it in the comments and perhaps I'll write about it in a future post!</p>
Damn Cool Algorithms: Homomorphic Hashingtag:blog.notdot.net,2012-08-29:post:1050012012-08-29T08:36:27Z2012-08-28T16:05:47ZNick Johnsonhttp://blog.notdot.net/
<style type="text/css">
code { display: inline; padding: 5px 0 0 0}
sup { vertical-align: super }
sub { vertical-align: sub}
</style>
<p>In the last Damn Cool Algorithms post, we learned about <a href="http://blog.notdot.net/2012/01/Damn-Cool-Algorithms-Fountain-Codes">Fountain Codes</a>, a clever probabilistic algorithm that allows you break a large file up into a virtually infinite number of small chunks, such that you can collect any subset of those chunks - as long as you collect a few more than the volume of the original file - and be able to reconstruct the original file. This is a very cool construction, but as we observed last time, it has one major flaw when it comes to use in situations with untrusted users, such as peer to peer networks: there doesn't seem to be a practical way to verify if a peer is sending you valid blocks until you decode the file, which happens very near the end - far too late to detect and punish abuse.</p>
<p>It's here that Homomorphic Hashes come to our rescue. A homomorphic hash is a construction that's simple in principle: a hash function such that you can compute the hash of a composite block from the hashes of the individual blocks. With a construction like this, we could distribute a list of individual hashes to users, and they could use those to verify incoming blocks as they arrive, solving our problem.</p>
<p>Homomorphic Hashing is described in the paper <a href="http://pdos.csail.mit.edu/papers/otfvec/paper.pdf">On-the-fly verification of rateless erasure codes for efficient content distribution</a> by Krohn et al. It's a clever construction, but rather difficult to understand at first, so in this article, we'll start with a strawman construction of a possible homomorphic hash, then improve upon it until it resembles the one in the paper - at which point you will hopefully have a better idea as to how it works. We'll also discuss the shortcomings and issues of the final hash, as well as how the authors propose to resolve them.</p>
<p>Before we continue, a small disclaimer is needed: I'm a computer scientist, not a mathematician, and my discrete math knowledge is far rustier than I'd like. This paper stretches the boundaries of my understanding, and describing the full theoretical underpinnings of it is something I'm likely to make a hash of. So my goal here is to provide a basic explanation of the principles, sufficient for an intuition of how the construction works, and leave the rest for further exploration by the interested reader.</p>
<h3>A homomorphic hash that isn't</h3>
<p>We can construct a very simple candidate for a homomorphic hash by using one very simple mathematical identity: the observation that <code>g<sup>x0</sup> * g<sup>x1</sup> = g<sup>x0 + x1</sup></code>. So, for instance, <code>2<sup>3</sup> * 2<sup>2</sup> = 2<sup>5</sup></code>. We can make use of this by the following procedure:</p>
<ol>
<li>Pick a random number g</li>
<li>For each element <code>x</code> in our message, take <code>g<sup>x</sup></code>. This is the hash of the given element.</li>
</ol>
<p>Using the identity above, we can see that if we sum several message blocks together, we can compute their hash by multiplying the hashes of the individual blocks, and get the same result as if we 'hash' the sum. Unfortunately, this construction has a couple of obvious issues:</p>
<ul>
<li>Our 'hash' really isn't - the hashes are way longer than the message elements themselves!</li>
<li>Any attacker can compute the original message block by taking the logarithm of the hash for that block. If we had a real hash with collisions, a similar procedure would let them generate a collision easily.</li>
</ul>
<h3>A better hash with modular arithmetic</h3>
<p>Fortunately, there's a way we can fix both problems in one shot: by using modular arithmetic. Modular arithmetic keeps our numbers bounded, which solves our first problem, while also making our attacker's life more difficult: finding a <a href="http://en.wikipedia.org/wiki/Preimage_attack">preimage</a> for one of our hashes now requires solving the <a href="http://en.wikipedia.org/wiki/Discrete_logarithm">discrete log problem</a>, a major unsolved problem in mathematics, and the foundation for several cryptosystems.</p>
<p>Here, unfortunately, is where the theory starts to get a little more complicated - and I start to get a little more vague. Bear with me.</p>
<p>First, we need to pick a modulus for adding blocks together - we'll call it <code>q</code>. For the purposes of this example, let's say we want to add numbers between 0 and 255, so let's pick the smallest prime greater than 255 - which is 257.</p>
<p>We'll also need another modulus under which to perform exponentiation and multiplication. We'll call this <code>p</code>. For reasons relating to <a href="http://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat's Little Theorem</a>, this also needs to be a prime, and further, needs to be chosen such that <code>p - 1</code> is a multiple of <code>q</code> (written <code>q | (p - 1)</code>, or equivalently, <code>p % q == 1</code>). For the purposes of this example, we'll choose 1543, which is <code>257 * 6 + 1</code>.</p>
<p>Using a finite field also puts some constraints on the number, g, that we use for the base of the exponent. Briefly, it has to be 'of order q', meaning that g<sup>q</sup> mod p must equal 1. For our example, we'll use 47, since <code>47<sup>257</sup> % 1543 == 1</code>.</p>
<p>So now we can reformulate our hash to work like this: To hash a message block, we compute <code>g<sup>b</sup> mod p</code> - in our example, <code>47<sup>b</sup> mod 1543</code> - where b is the message block. To combine hashes, we simply multiply them <code>mod p</code>, and to combine message blocks, we add them <code>mod q</code>.</p>
<p>Let's try it out. Suppose our message is the sequence <code>[72, 101, 108, 108, 111]</code> - that's "Hello" in ASCII. We can compute the hash of the first number as <code>47<sup>72</sup> mod 1543</code>, which is 883. Following the same procedure for the other elements gives us our list of hashes: <code>[883, 958, 81, 81, 313]</code>.</p>
<p>We can now see how the properties of the hash play out. The sum of all the elements of the message is 500, which is 243 mod 257. The hash of 243 is <code>47<sup>243</sup> mod 1543</code>, or 376. And the product of our hashes is <code>883 * 958 * 81 * 81 * 313 mod 1543</code> - also 376! Feel free to try this for yourself with other messages and other subsets - they'll always match, as you would expect.</p>
<h3>A practical hash</h3>
<p>Of course, our improved hash still has a couple of issues:</p>
<ul>
<li>The domain of our input values is small enough that an attacker could simply try them all out to find collisions. And the domain of our output values is small enough the attacker could attempt to find discrete logarithms by brute force, too.</li>
<li>Although our hashes are shorter than they were without modular arithmetic, they're still longer than the input.</li>
</ul>
<p>The first of these is fairly straightforward to resolve: we can simply pick larger primes for p and q. If we choose ones that are sufficiently large, both enumerating all inputs and brute force logarithm finding will become impractical.</p>
<p>The second problem is a little trickier, but not hugely so; we just have to reorganize our message a bit. Instead of breaking the message down into elements between 0 and q, and treating each of those as a block, we can break the message into arrays of elements between 0 and q. For instance, suppose we have a message that is 1024 bytes long. Instead of breaking it down into 1024 blocks of 1 byte each, let's break it down into, say, 64 blocks of 16 bytes. We then modify our hashing scheme a little bit to accommodate this:</p>
<p>Instead of picking a single random number as the base of our exponent, g, we pick 16 of them, <code>g<sub>0</sub> - g<sub>16</sub></code>. To hash a block, we take each number <code>g<sub>i</sub></code> and raise it to the power of the corresponding sub-block. The resulting output is the same length as when we were hashing only a single block per hash, but we're taking 16 elements as input instead of a single one. When adding blocks together, we add all the corresponding sub-blocks individually. All the properties we had earlier still hold. Better, we've given ourselves another tuneable parameter: the number of sub blocks per block. This will be invaluable in getting the right tradeoff between security, granularity of blocks, and protocol overhead.</p>
<h3>Practical applications</h3>
<p>What we've arrived at now is pretty much the construction described in the paper, and hopefully you can see how it would be applied to a system utilizing fountain codes. Simply pick two primes of about the right size - the paper recommends 257 bits for q and 1024 bits for p - figure out how big you want each block to be - and hence how many sub-blocks per block - and figure out a way for everyone to agree on the random numbers for g - such as by using a random number generator with a well defined seed value.</p>
<p>The construction we have now, although useful, is still not perfect, and has a couple more issues we should address. First of these is one you may have noticed yourself already: our input values pack neatly into bytes - integers between 0 and 255 in our example - but after summing them in a finite field, the domain has grown, and we can no longer pack them back into the same number of bits. There are two solutions to this: the tidy one and the ugly one.</p>
<p>The tidy one is what you'd expect: Since each value has grown by one bit, chop off the leading bit and transmit it along with the rest of the block. This allows you to transmit your block reasonably sanely and with minimal expansion in size, but is a bit messy to implement and seems - at least to me - inelegant.</p>
<p>The ugly solution is this: Pick the smallest prime number larger than your chosen power of 2 for q, and simply ignore or discard overflows. At first glance this seems like a terrible solution, but consider: the smallest prime larger than <code>2<sup>256</sup></code> is <code>2<sup>256</sup> + 297</code>. The chance that a random number in that range is larger than <code>2<sup>256</sup></code> is approximately 1 in <code>3.9 * 10<sup>74</sup></code>, or approximately one in <code>2<sup>247</sup></code>. This is way smaller than the probability of, say, two randomly generated texts having the same SHA-1 hash.</p>
<p>Thus, I think there's a reasonable argument for picking a prime using that method, then simply ignoring the possibility of overflows. Or, if you want to be paranoid, you can check for them, and throw out any encoded blocks that cause overflows - there won't be many of them, to say the least.</p>
<h3>Performance and how to improve it</h3>
<p>Another thing you may be wondering about this scheme is just how well it performs. Unfortunately, the short answer is "not well". Using the example parameters in the paper, for each sub-block we're raising a 1024 bit number to the power of a 257 bit number; even on modern hardware this is not fast. We're doing this for every 256 bits of the file, so to hash an entire 1 gigabyte file, for instance, we have to compute over 33 million exponentiations. This is an algorithm that promises to really put the assumption that it's always worth spending CPU to save bandwidth to the test.</p>
<p>The paper offers two solutions to this problem; one for the content creator and one for the distributors.</p>
<p>For the content creator, the authors demonstrate that there is a way to generate the random constants g, used as the bases of the exponents using a secret value. With this secret value, the content creator can generate the hashes for their files much more quickly than without it. However, anyone with the secret value can also trivially generate hash collisions, so in such a scheme, the publisher must be careful not to disclose the value to anyone, and only distribute the computed constants g<sub>i</sub>. Further, the set of constants themselves aren't small - with the example parameters, a full set of constants weighs in at about the size of 4 data blocks. Thus, you need a good way to distribute the per-publisher constants in addition to the data itself. Anyone interested in this scheme should consult section C of the paper, titled "Per-Publisher Homomorphic Hashing".</p>
<p>For distributors, the authors offer a probabilistic check that works on batches of blocks, described in section D, "Computational Efficiency Improvements". Another easier to understand variant is this: Instead of verifying blocks individually as they arrive, accumulate blocks in a batch. When you have enough blocks, sum them all together, and calculate an expected hash by taking the product of the expected hashes of the individual blocks. Compute the composite block's hash. If it verifies, all the individual blocks are valid! If it doesn't, divide and conquer: split your batch in half and check each, winnowing out valid blocks until you're left with any invalid ones.</p>
<p>The nice thing about either of these procedures is that they allow you to trade off verification work with your vulnerability window. You can even dedicate a certain amount of CPU time to verification, and simply batch up incoming blocks until the current computation finishes, ensuring you're always verifying the last batch as you receive the next.</p>
<h3>Conclusion</h3>
<p>Homomorphic Hashing provides a neat solution to the problem of verifying data from untrusted peers when using a fountain coding system, but it's not without its own drawbacks. It's complicated to implement and computationally expensive to compute, and requires careful tuning of the parameters to minimise the volume of the hash data without compromising security. Used correctly in conjunction with fountain codes, however, Homomorphic Hashing could be used to create an impressively fast and efficient content distribution network.</p>
<p>As a side-note, I'm intending to resume more regular blogging with more Damn Cool Algorithms posts. Have an algorithm you think is Damn Cool and would like to hear more about? Post it in the comments!</p>
An interesting gotcha in Java unittestingtag:blog.notdot.net,2012-07-03:post:1040012012-07-03T04:36:43Z2012-07-03T04:36:43ZNick Johnsonhttp://blog.notdot.net/
<p>Those of you who have more than a passing familiarity with Java will be aware of Java's semantics for object equality using '==': it returns true iff the two arguments are the same object. There's no operator overloading for the equals operator, even by built in classes; classes that want to do value equality tests are expected to override the built in 'equals()' method.</p>
<p>This leads to a common pitfall with Strings. The following compiles fine, with no warnings:</p>
<pre class="prettyprint">public static void isFoo(String s) {
return s == "foo";
}</pre>
<p>This won't work as intended, of course, because it will only return true if the passed in string is the very same object as our string literal "foo". Let's write a unittest to make sure everything works as expected:</p>
<pre class="prettyprint">@Test
public void testIsFoo() {
assertTrue(Blah.isFoo("foo"));
assertFalse(Blah.isFoo("bar"));
}</pre>
<p>But wait! This test passes just fine! Something is very wrong here.</p>
<p>What's happening is that the Java compiler is <a href="http://en.wikipedia.org/wiki/String_interning">interning</a> string literals, so that multiple uses of the same string constant point to the same actual string object at runtime. This saves memory, but it's making it tough for us to test properly. You might think you can fix the test like this:</p>
<pre class="prettyprint">@Test
public void testIsFoo() {
assertTrue(Blah.isFoo("fo" + "o"));
assertFalse(Blah.isFoo("bar"));
}</pre>
<p>Unfortunately, this test passes too - the Java compiler is smart enough to evaluate the string concatenation at compile time, and again interns the literals to point at the same string.</p>
<p>The best way to make sure you're testing with distinct strings is to construct a new string from your string literal, thus ensuring it <em>has</em> to be an entirely new object, like this:</p>
<pre class="prettyprint">@Test
public void testIsFoo() {
assertTrue(Blah.isFoo(new String("foo")));
assertFalse(Blah.isFoo(new String("bar")));
}</pre>
<p>Solved! I've always said you have to be careful around interns.</p>
Endings and beginningstag:blog.notdot.net,2012-05-15:post:1030012012-05-15T03:01:27Z2012-05-15T03:01:27ZNick Johnsonhttp://blog.notdot.net/
<p>As of July 8th, I will no longer be working at Google.</p>
<p>The last four-and-a-half years at Google have been an incredible experience. First working as an SRE - Google's oncall / infrastructure role - and then as a Developer Programs Engineer, I've learned an incredible amount, worked with some amazing people, and done some exciting things. I've been with the Google App Engine team - first in a 20% capacity, and then full time - since before the product launched, and I think I've had a significant impact on the product and the community that's formed around it. I've greatly enjoyed interacting with everyone in the community and lending a hand where I can.</p>
<p>Ultimately, though, I've decided it's time for me to move on and seek out new challenges. With that in mind, I'm going to be working at <a href="https://smartsparrow.com">Smart Sparrow</a>, an exciting Sydney startup who are making waves in the education area. They're doing some really impressive stuff with E-Learning, and I'm excited to be joining them and helping shape things. Although I've worked at small companies before, this will be my first time at a for-real startup, and I'm looking forward to the experience.</p>
<p>In my absence, App Engine support will continue to be handled by my extremely capable colleagues. I still like the product and the community, so I won't be disappearing entirely - I expect I'll continue to answer questions on Stack Overflow and post the occasional App Engine blogpost here for the forseeable future.</p>
Shameless plugtag:blog.notdot.net,2012-01-31:post:1020012012-01-31T02:56:00Z2012-01-31T02:56:00ZNick Johnsonhttp://blog.notdot.net/
<p>Just a quick reminder that I've started my tour of NZ, and I'll be blogging about it at <a href="http://laidbacktouring.blogspot.co.nz/">http://laidbacktouring.blogspot.co.nz/</a> - so check it out and subscribe, if you're so inclined.</p>
<p>This is the last time I plug the blog here, I promise*.</p>
<p>* I reserve the right to renege on this promise. It _is_ my blog, after all.</p>