Chapter 1
Peer-to-peer Networks
Working as peers is better
than always serving a client
– Riccardo Pecori
A P2P communication structure is made of multiple autonomous devices that
interact as equals acting both as client and server, so there is no a centralized server
or at least its usage is limited.
A P2P network is a quite complex system that represents a synthesis of several
technological components, and one of them is the overlay network, that is a subgroup
of the network composed by hosts running the same P2P client software. This virtual
network overlaps the underlying network (e.g.: the Internet) with its own logical con-
nections between nodes, maybe physically very far in the real world (see Figure 1.1).
The logical links map onto maybe several physical links of a transport network.
P2P networks can be not structured or structured. In the first case arbitrary logical
links exist among peers that can be all involved in sharing operations; in the latter
case each node in the P2P network has its own identifier and connections according
to a specific, firm algorithm that renders storage and retrieval operations simpler and
faster.
In the history at least four [4] generations of P2P networks have been developed:
4 Chapter 1. Peer-to-peer Networks
Figure 1.1: Overlay and underlay networks.
1. Centralized P2P networks like the by now dead Napster. They exploit a central
server to provide indexes of distributed resources, and then the communication
is peer-to-peer.
2. Pure P2P like Gnutella. In a pure P2P network each node implements functions
of both client and server, and either peer can initiate a communication session
at any moment. The research takes place flooding the network, and then the
communication is only end-to-end between the initiator peer and the responder
peer.
3. Hybrid P2P like JXTA or Skype. In this case there is a combination between
the previous two cases; there are superpeers acting as servers towards a reduced
part of the network. This improves bandwidth and grants signaling operations
1.0. Peer-to-peer Networks 5
saving.
4. Distributed Hash Tables (DHT) based P2P networks, which are a building
block of many nowadays peer-to-peer applications like KAD. DHT mecha-
nisms provide structured P2P networks and guaranteed data retrieval, moder-
ate lookup times, automatic load balancing and self-organizing data storage
and lookup system.
The first three are unstructured P2P networks, whereas the last one is usually de-
fined as a structured network. The research work of the thesis was focused mainly
onto structured p2p networks with particular care to DHT-based networks and their
resource look-up mechanisms. P2P platforms that adopted a DHT structure are for ex-
ample: eMule (Kademlia [5]), BitTorrent (Kademlia), CFS (Chord [6]), OceanStore
(Tapestry [7]), etc.
6 Chapter 1. Peer-to-peer Networks
1.1 DHT Networks
The general structure of DHT-based platforms provides a mapping (the hash table)
between keys and values, where values are stored in a distributed manner across the
p2p networks. The mapping is maintained in a distributed routing table in which
nodes are addressed according to particular IDs. Both IDs/keys and values are the
result of applying a hash function to some form of identification and the entire or a
piece of the actual value.
Responsibility for maintaining the mapping from names to values is distributed
among all nodes, in such a way that a change in a set of participants causes a minimal
probability of disruption. This allows DHTs to scale to extremely large numbers of
participants and to handle continual joins, leaves, and failures of nodes.
The lookup procedure, the core part of this type of networks, allows to find a
certain value position in the network and so to store or retrieve the corresponding
value that, in turn, can be a file, the Address Of Record of a SIP user, etc.
IDs distribution in the network and the lookup procedures vary a lot according to
the particular DHT algorithm chosen, e.g. Chord distributes IDs in a ring, CAN [8]
in rectangles, Kademlia in a tree-like manner, etc.
We now briefly describe two of the most used DHT, namely Chord and Kademlia,
as these will be the starting points from which our research kicked off.
1.1.1 Chord
Chord is a lookup protocol based on consistent hashing that maps keys to nodes
responsible for them in a circular virtual-ring structure. Every node and key are as-
signed a unique m-bit identifier using a hash function, e.g.: SHA-1, applied to the
node’s IP address, or the key’s value.
Chord uses consistent hashing [9] to assign keys to its nodes as it tends to balance
load, since each node receives roughly the same number of keys, and it requires
relatively little movement of keys when nodes join and leave the system.
Chord views the identifier space as a circle formed by no more than 2
m
nodes
(where m= 160) with identifiers/keys ranging from 0 to 2
m
1 [6].
1.1. DHT Networks 7
Each node of a Chord network maintains two data structures:
• successor list;
• finger table.
The first one is necessary to keep the correct organization of the network; in fact
it contains peers immediately succeeding the node’s key in the identifier circle in a
clockwise direction.
The second one is needed to accelerate the process of lookup of a resource.
A node with the smallest ID that is greater than or equal to i represents the suc-
cessor of a key (or node identifier) i. The successor node of a key is responsible for
that key.
Figure 1.2 shows an example of a Chord ID ring. For example node 14 is successor
of node 8 and of key 10, so it is responsible for key 10; node 32 is the successor of
node 21 and keys 30 and 24 of which it is the responsible, etc.
When a node n leaves a network, all the keys it is responsible for should be
reassigned to its successor. In the case when a node n joins a network, certain keys
previously assigned to n’s successor pass to n.
Once a new node joins the network, the node occupies an appropriate position in the
identifier circle, and the predecessors of the newly joined peer update their successor
lists.
A finger table is a routing table which contains IP addresses of peers halfway
around the ID space from the node, a quarter-of-the-way, an eighth-of-the-way and
so forth. Given N nodes in the Chord network, the size of a Chord routing table is
log
2
N. If a node is looking for a resource with key k, it forwards the query to a node
in its finger table with the highest ID not exceeding k reaching it in at most O(log
2
N)
steps.
The lookup in Chord can be implemented in both iterative and recursive modes;
in the first case the control remains always to the initial requesting node who has,
at every step, to send new request messages, in the second case the control passes,
8 Chapter 1. Peer-to-peer Networks
Figure 1.2: Chord ID assignement [6].
at every step, to the node at that moment closer to the searched resource who is in
charge of sending the next requests to the next closer node.
1.1.2 Kademlia
Kademlia [5] networks are arranged basing upon a XOR metric, as the distance be-
tween two kademlia IDs is defined as: d(x;y)= x y. As in Chord IDs of both nodes
and resources (keys) are usually made of 160 bits. Data are replicated in the k (the
recommended value for k is 20) closest nodes to a key where key/value pair is stored.
Kademlia, differently from Chord, has a binary tree-like data structure and nodes are
considered as leafs on this tree (see Figure 1.3). Routing processes are implemented
in prefix-matching mode.
1.1. DHT Networks 9
Figure 1.3: Kademlia tree structure [5].
Kademlia nodes store contact information to route query messages, keeping lists
of <IP address; UDP port; Node ID> triples for nodes of distance from itself be-
tween 2
i
and 2
i+1
(in base 10) and for i ranging from 0 to 160 excluded. These lists
are known as k-buckets, where k is both the length of the lists and the abovementioned
system parameter that is usually chosen such that any given k nodes are very unlikely
to fail within an hour of each other. Given the binary structure these k-buckets contain
at least one node.
Kademlia routing is iterative, i.e., the node querying for a resource receives back,
from the queried one, a list of nodes to ask at the next step until the node responsible
for the searched resource is found; moreover, given the binary structure, a lookup
procedure usually takes at most O(log
2
N) steps where N is the number of nodes in
the network.
In classical Kademlia both in storing and in retrieving resources the requesting
peer runs a parallel asynchronous queries for a given key at the same time, a is
another system parameter usually set to 3. Thea peers are chosen by the requesting
peer according to a certain policy among its k peers nearest to the searched resource
10 Chapter 1. Peer-to-peer Networks
that are stored in a temporary list to be updated at each iterative step. Each of the
a peers contacted responds to the querying process initiator with its own k nodes
nearest to the resource key to be find. The initiator peer can have as a response at
most a k nodes; among these it selects for the next step only the k nearest to the
researched resource and inserts in the temporary list the peers not already present.
For the next step it iteratively selects in this temporary list, always in accordance to
a certain policy, a to query, and so on. This iterative process ends up when at step
n, among the at most a k responses, the initiator is unable to find at least one peer
nearer to the resource than any other already present in the temporary list updated at
step n-1. As a side effect of a lookup process, both for nodes or for keys, k-buckets of
the contacted nodes are populated and updated.
Differently from Chord, Kademlia uses only iterative lookup procedures peform-
ing them in parallel mode witha as parallelism degree.