Damn Cool Algorithms: Homomorphic Hashing

In the last Damn Cool Algorithms post, we learned about Fountain Codes, a clever probabilistic algorithm that allows you break a large file up into a virtually infinite number of small chunks, such that you can collect any subset of those chunks - as long as you collect a few more than the volume of the original file - and be able to reconstruct the original file. This is a very cool construction, but as we observed last time, it has one major flaw when it comes to use in situations with untrusted users, such as peer to peer networks: there doesn't seem to be a practical way to verify if a peer is sending you valid blocks until you decode the file, which happens very near the end - far too late to detect and punish abuse.

It's here that Homomorphic Hashes come to our rescue. A homomorphic hash is a construction that's simple in principle: a hash function such that you can compute the hash of a composite block from the hashes of the individual blocks. With a construction like this, we could distribute ...

Damn Cool Algorithms: Fountain Codes

That's right, it's time for another episode of the frustratingly infrequent Damn Cool Algorithms series! If you're not familiar with it, you might want to check out some of the previous posts.

Today's subject is Fountain Codes, otherwise known as "rateless codes". A fountain code is a way to take some data - a file, for example - and transform it into an effectively unlimited number of encoded chunks, such that you can reassemble the original file given any subset of those chunks, as long as you have a little more than the size of the original file. In other words, it lets you create a 'fountain' of encoded data; a receiver can reassemble the file by catching enough 'droplets', regardless of which ones they get and which ones they miss.

What makes this so remarkable is that it allows you to send a file over a lossy connection - such as, say, the internet - in a way that doesn't rely on you knowing the rate of packet loss, and doesn't require the receivers to communicate anything back to you about which packets they missed. You can see how this would be useful in a number of ...