Kademlia Highly Unbalanced Routing Table

The Caddelia document in the last paragraph of section 2.4 says that to properly handle highly unbalanced trees ...

Kademlia nodes contain all valid contacts in a subtree of at least k nodes in size, even if this requires separating buckets in which the node does not belong to the ID.

However, in the previous section of the document, it seems to indicate that if the k-bucket already has k elements, then for any additional additions to this k-bucket, you need to delete the most turned node (first checking it to see if it is alive) or otherwise cache add until the slot is available in this k-forging.

This article seems to contradict these two points.

Under what conditions is it necessary to split a k-bucket and why? It seems impractical to keep β€œall valid contacts” in the routing table, since the routing table will be very fast very fast. This example is about a tree with many nodes starting at 001 and with one node starting at 000. will a node starting at 000 have to continuously split its k-bucket into 001 to store all valid nodes starting from 001? In a 160-bit address space, would it potentially store 2 ^ 157 nodes in a 000 routing table?

The wording in the quoted block is also very confusing ...

"in a subtree" - in which subtree of the routing table?

"atleast k nodes size" - what metric do we use to determine the size of the subtree? In this case, the nodes refer to kademlia or k-buckets, or something else?

+5
source share
1 answer

However, in the previous section of the document, it seems to indicate that if the k-bucket already has k elements, then for any additional additions to this k-bucket, you need to delete the most turned node (first checking it to see if it is alive) or otherwise cache add until the slot is available in this k-forging.

Thus, the bucket is saved when there is a node contact for insertion, but the bucket is not suitable for separation.

Under what conditions is it necessary to break a k-bucket and why?

In a first approximation: Break a bucket whenever a new node cannot be inserted and a space in the bucket covers your node identifier.

This is necessary in order to maintain full awareness of your area, having only a vague awareness of the remote areas of the key. That is, for locality.

To cover the case of an unbalanced tree - what can happen if the node identifiers are not (pseudo) random, or at least in leaf buckets, due to statistical errors when they are randomly assigned - the approach has to relax as follows:

At

  • attempt to insert a new contact C in the routing table
  • the bucket that C belongs to is full
  • C is closer to your node id than K th -closest node in your routing table, where K is the bucket size

then split the bucket.

In practice, this should be changed a little further, so that relaxed separation is used for answers, while unsolicited requests should only use strict separation, otherwise you could get some kind of weirdly distorted routing table when relaxed splitting occurs during start when the table is not populated yet.

+5
source

All Articles