This is a very rough introduction to the basic ideas of NEBLO and its API. Many details are omitted, and many concepts are defined at a very informal level. For more academic documents see the Docs.
Overlay networks are the foundation of peer to peer systems. They provide a flexible and dynamic network abstraction, by super-imposing a virtual graph over the physical internet and replacing IP addresses by overlay addresses (usually 160-bit keys).
When a peer joins the overlay, it becomes the owner of a contiguous interval of overlay addresses. Ideally, any overlay address is owned by one peer, and possibly also by other ones for better resiliency.
Each peer becomes a node in the overlay graph, and must only ``know'' those peers which are directly connected to it by an edge of the graph. A mechanism is provided, by which a peer can learn about its neighbours in the graph. An edge from peer A to peer B does not mean that A is ``close'' to B on the internet; it just means that A knows B, and that a connection may have been established from A to B.
The overlay provides a routing algorithm to allow messages to crawl the overlay graph towards a given destination. The destination of each message is an overlay address, not an IP address. A message, sent by a given peer, is forwarded from one peer to another along a path in the graph, until it reaches a peer which is owner of the destination address.
The routing algorithm is the set of rules that each peer autonomously applies to every message (either incoming or locally originated), in order to decide whether the message is to be kept locally (local peer is owner of the destination address) or forwarded, and in the latter case which peer is better to forward to.
A message can have different meanings, depending on the particular peer-to-peer application; in a storage/retrieval application, for instance, messages carry download/upload requests for documents, to be served by the destination peer. In general, a message is to trigger some application-dependent action on the destination peer.
Overlay networks come in two flavours. Unstructured overlays have an irregular graph, that spontaneously develops and grows at runtime as a sort of social network among peers, and in which the contents find their placement following loosely predetermined rule. Structured overlays provide more rigid rules which only allow graphs of regular and predictable shape (chordal rings, trees, butterflies, etc.); the contents are constrained to specific layouts, depending on their name as well as the graph shape itself. The Freenet anonymous storage is based on an unstructured overlay. The file sharing protocol E-Donkey uses a structured overlay called Overnet, which in turn is an implementation of the Kademlia structured overlay. Another structured overlay, well known in the academic literature, is Chord.
Because of their regular shape, structured overlays have efficient routing algorithms that converge in few hops compared to the size of the graph. Unstructured overlays are far less efficient under this respect: it is very difficult to find a short path in an irregular graph, it is like driving a car in a jungle. Unstructured overlays, however, provide a better anonymity, because their irregular shape makes it difficult for a coalition of malicious peers to correlate overlay addresses to real IP addresses of the hosts involved in the overlay. After all, everybody knows that a jungle is a good place where to hide oneself from the enemy.
So, it seems that routing efficiency and user anonymity are mutually exclusive goals. The main motivation of NEBLO is to practically demonstrate that efficiency and anonymity can indeed coexist in the same overlay network. By providing a suitable API, rather than implementing its own high-level services, such an overlay can serve as efficient and anonymous substrate for a vaste range of peer to peer applications.
The trick: imprecise routing tables
In a structured overlay, each peer usually links a small number of other peers. If peer A has a link to peer B, then A knows the IP address of B as well as the overlay addresses owned by B itself; the overlay addresses are either discovered directly, or inferred by way of a mathematical formula related to the graph. Peer B is said to be ``close'' to A if the overlay addresses of B are ``close'' to the ones of A under some metrics, otherwise it is said to be ``distant''. Each peer has some links to ``close'' peers and some other links to ``distant'' peers; they are chosen such that the resulting graph has a regular and predictable shape (Fig. 1). It follows that each peer knows the IP address and the overlay addresses of any other peers it links to.
The availability of ``close'' as well as ``distant'' links makes it possible to route messages efficiently, following a (more or less) steepest-descent path (Fig. 2).
But this efficiency comes at a price. The accurate knowledge each peer maintains about other peers can be maliciously exploited to discover which IP address is responsible for which overlay address. For instance, if the overlay counts N peers and each of them ``knows'' O(log(N)) other peers (Fig. 1), a coalition of O(N/log(N)) malicious peers can discover the internet location of an arbitrary overlay address. This breaks anonymity, especially recipient anonymity (privacy of the place where each message is headed). A coalition of O(N/log(N)) peers becomes a small minority when N becomes large. For instance, if N = 1000000 then a coalition as small as 1/20 of the entire population can map the whole ring, and smaller coalitions can however map parts of it.
In unstructured overlays, however, no peer is required to know the overlay addresses of any other peers. A link from peer A to peer B is labelled by an overlay address K, meaning that A should forward a message M to B whenever the destination address of M is ``close'' to K under some metrics. The address K is not necessarily related to the overlay addresses owned by B; B is just likely to be ``closer'' to K than A. The routing is thus imprecise: it is not guaranteed to be fast, nor even to converge at all. This way, however, it is much more complicated to understand which IP address is responsible for which overlay address, and recipient anonymity is better preserved.
NEBLO attempts a synthesis of both approaches, with the goal of preserving both efficiency and anonymity. The basic idea (Fig. 3) is that the overlay graph should remain in regular and predictable shape, but no peer should be allowed to keep accurate information about all the peers it knows about. The information related to ``distant'' peers should be forced to be inaccurate, to some extent. Inaccuracy on the long distance means less efficiency in the first few hops, but no dramatic impact on the routing paths, which are proved to at most double their length.