The biggest design problem that I see here is the closure requirement: we need to return all the elements within the distance N of this vector for an arbitrary N. The amount of data is sparse: "billions" are of the order of 2 ^ 33, but we have 512 bits of information, so there is only 1 record for 2 ^ (512-33) possibilities. For randomly distributed keys, the expected distance between any two nodes is 256; the expected nearest neighbor distance is about 180.
This makes me expect that your search will depend on non-random data clusters and that recognition of this clustering will contribute to your search. This will be a somewhat painful step in pre-processing the source data, but it should be useful.
My general approach to this is to first identify these clusters in some usually fast way. Start with a hash function that returns a very general distance metric. For example, for any vector, calculate the distances to each of the many orthogonal support vectors. For 16 bits, you can take the following set (specified in hexadecimal format): 0000, 00FF, 0F0F, 3333, 5555, a serial βseed of alternating bits.β Return this hash as a simple set of 4-bit distances, only 20 bits (there is a real saving for long vectors, since one of the sizes is 2 ^ (2 ^ N)).
Now this hash tuple allows you to get a rough estimate of the distance for hamming, so you can group vectors more easily: similar vectors should have similar hash values.
Inside each cluster, find the central element, and then characterize each cluster element by its distance from this center. For speed, give each node a list of nearest neighbors with distances, all of which are inside the cluster. This gives you a graph for each cluster.
Similarly, connect all the centers of the clusters, giving straight edges to the nearest centers of the cluster. If your data is reasonably searchable, then we can guarantee that for any two nodes A, B with cluster centers Ac and Bc we will have d (A, Ac) + d (B, Bc) <d (A, B). Each cluster is a topological neighborhood.
The request process is slightly faster. For the target vector V, find the hash value. Find the centers of the clusters that are close enough to the hat value, which may correspond to something in their vicinity ([actual distance] - [query range] - [cluster radius]). This will immediately remove entire clusters and may give you a whole group of "hits". For each marginal cluster (some, but not all nodes qualify) you need to find a node that works with something close to brute force (beginning in the middle of the range of viable distances from the center of the cluster) and then search the width of each node neighbor.
I expect this to give you something comparable to optimal performance. It also decently decides on additions and deletions if they are not frequent enough to change cluster membership for other nodes.
The set of vectors is simple. Write out the bit patterns for the 16-bit case:
0000 0000 0000 0000 16 0s 0000 0000 1111 1111 8 0s, 8 1s 0000 1111 0000 1111 4 0s, 4 1s, repeat 0011 0011 0011 0011 2 0s, 2 1s, repeat 0101 0101 0101 0101 1 0s, 1 1s, repeat