Nearly all DHT implementations vulnerable to 'merge' bug.
Posted by Nick Johnson | Filed under tech, coding
As DHT implementations proliferate and harmonise, the prospect of multiple widely-deployed applications using the same or compatiable DHT implementations is increasingly becoming a reality. There are a large and increasing number of DHT libraries out there, such as Entangled , and FreePastry, being used by an increasing number of applications.However, most of these implementations are vulnerable to a simple but subtle bug: They have no way of distinguishing one DHT network from another. Although each application's DHT network or networks start off separate, if, by chance or deliberate action a node from a different but compatiable DHT is introduced, the 'self healing' property of DHTs will ensure that, sooner or later, the two networks become merged into a single DHT.
This is not a problem for single-purpose DHT implementations such as those used by BitTorrent or Overnet, since they generally establish a single DHT with all compatiable clients participating in any case. Nor is this a problem for networks designed with heterogenous applications in mind, such as CSpace. However, this still leaves a number of DHT libraries that don't fall into either of these categories.
If DHTs from two distinct applications using compatiable implementations become merged, the outcome depends to a large extent on how those applications make use of the DHT. If they use compatiable parameters and only make use of the basic DHT store/retrieve/find-node functionality, the impact may be minimal, though differences in persistence time and maximum size for stored elements can still lead to issues. If the DHTs have different parameters, such as the number of nodes to replicate inserted data to, the result may be subtly broken - a DHT that no longer fulfils the guarantees the software expects from it.
If the two applications expect very different things from their DHTs, though, such as a varying set of RPCs supported, the result could be a complete breakdown in the usefulness of the DHT. If your application expects that the vast majority of participating nodes will support its custom doFoo RPC (not an unreasonable expectation when you control the application), and suddenly half the nodes don't, the probability is that the application's behaviour will be severely degraded, at best.** This amounts to an entirely accidental Denial of Service attack on both applications involved.
The solution is relatively simple, of course - simply add an application identifier or 'magic number' to the DHT protocol, and require each application to specify an identifier unique to it. Then, the DHT library can simply discard packets with IDs it doesn't recognise.
In a quick survey of current DHT implementations, at least the following seem to be vulnerable:
Freepastry is notable in that it does provide for a 'magic number' facility. However, this is buried deep in the protocol stack, and is not required in order to create a Freepastry DHT instance.
I'm not aware of this bug occurring yet, but with increasing use of DHTs, it seems to me to be all but inevitable that it will occur at some point. Further, restoring normal behaviour after it has occurred is likely to be extremely difficult, likely requiring starting an entirely new DHT instance and moving all existing clients over to it.
** Please forgive my overuse of generalities. As I'm speaking of a general bug shared across many libraries and implementations, it's hard to be specific about the impact. Previous Post Next Post