Internet-Draft The R5N Distributed Hash Table August 2022
Schanzenbach, et al. Expires 20 February 2023 [Page]
Workgroup:
Independent Stream
Internet-Draft:
draft-schanzen-r5n-00
Published:
Intended Status:
Informational
Expires:
Authors:
M. Schanzenbach
GNUnet e.V.
C. Grothoff
Berner Fachhochschule
B. Fix
GNUnet e.V.

The R5N Distributed Hash Table

Abstract

This document contains the R5N DHT technical specification.

This document defines the normative wire format of resource records, resolution processes, cryptographic routines and security considerations for use by implementers.

This specification was developed outside the IETF and does not have IETF consensus. It is published here to guide implementation of R5N and to ensure interoperability among implementations.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 20 February 2023.

Table of Contents

1. Introduction

Distributed Hash Tables (DHTs) are a key data structure for the construction of decentralized applications. DHTs are important because they generally provide a robust and efficient means to distribute the storage and retrieval of key-value pairs.

While [RFC6940] already provides a peer-to-peer (P2P) signaling protocol with extensible routing and topology mechanisms, it also relies on strict admission control through the use of either centralized enrollment servers or pre-shared keys. Some decentralized applications require a more open system that enables ad-hoc participation and other means to prevent common attacks on P2P overlays.

This document contains the technical specification of the R5N DHT [R5N], a secure DHT routing algorithm and data structure for decentralized applications. R5N is an open P2P overlay routing mechanism which supports ad-hoc permissionless participation and support for topologies in restricted-route environments. R5N also includes advanced features like tracing paths messages take through the network, response filters and on-path application-specific data validation.

This document defines the normative wire format of peer-to-peer messages, routing algorithms, cryptographic routines and security considerations for use by implementors.

1.1. Requirements Notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Structure of This Document

Section 2 gives an overview of the terminology used in this document. Section 3 then describes the overall architecture and the scope of this specification. Section 4 describes the application API, which clarifies the semantics provided by R5N for applications and thus is crucial as it motivates the rest of the design. Section 5 describes the underlay interface. This is the abstraction that applications must provide to R5N and thus a prerequisite for using the DHT. Before a DHT is usable, it must be bootstrapped. Bootstrapping is described in Section 6. Bloom filters, a key data structure used in various places, are introduced in Section 7 The central operation of a DHT is routing, which is detailed in Section 8. The processing of the various network messages is described in Section 9. Handling of Blocks, including validation and storage are described in Section 10. General security considerations are detailed in Section 11. IANA and GANA registry updates are listed in Section 12 and Section 13. The document concludes with test vectors in Section 14 and appendices with references.

2. Terminology

Peer:
A host that is participating in the overlay. Peers are responsible for holding some portion of the data that has been stored in the overlay, and they are responsible for routing messages on behalf of other hosts as needed by the Routing Algorithm.
Peer ID:
The Peer ID is the public key which is used to authenticate a peer in the underlay. The Peer ID is the public key of the corresponding Ed25519[ed25519] peer private key.
Peer Address:
The Peer Address is the identifier used on the Overlay to address a peer. It is a SHA-512 hash of the Peer ID.
Key
512-bit identifier of a location in the DHT. Multiple Blocks can be stored under the same key. Peer Addresses are valid keys.
Neighbor:
A neighbor is a peer which is directly able to communicate with our peer via the Underlay Interface.
Block:
Variable-size unit of payload stored in the DHT under a Key. Commonly also called a "value" when talking about a DHT as a "key-value store".
Block-Type:
A unique 32-bit value identifying the data format of a Block. Block-Types are either private or allocated by GANA (see Section 13).
Block Storage
The Block Storage component is used to persist and manage Block data by peers. It includes logic for enforcing storage quotas, caching strategies and data validation.
Responsible Peer:
The peer N that is responsible for a specific key K, as defined by the SelectClosestPeer(K, P) algorithm (see Section 8.
Applications
Applications are components which directly use the DHT overlay interfaces. Possible applications include the GNU Name System [I-D.draft-schanzen-gns] and the CADET transport system [cadet].
Application API
The application API exposes the core operations of the DHT overlay to applications. This includes storing blocks in the DHT and retrieving blocks from the DHT.
Message Processing
The Message Processing component processes requests from and generates responses to applications and the underlay network.
Routing
The Routing component includes the routing table as well as routing and peer selection logic. It facilitates the R5N routing algorithm with required data structures and algorithms.
Underlay Interface
The Underlay Interface is an abstraction layer on top of the supported links of a peer. Peers may be linked by a variety of different transports, including "classical" protocols such as TCP, UDP and TLS or advanced protocols such as GNUnet, I2P or Tor.

3. Architecture

R5N is an overlay network with a pluggable transport layer. The following figure shows the R5N architecture.

             |  +-----------------+  +-------+
Applications |  | GNU Name System |  | CADET |  ...
             |  +-----------------+  +-------+
-------------+------------------------------------ Application API
             |  ^
             |  |   +---------------+
             |  |   | Block Storage |
             |  |   +---------------+
             |  |    ^
R5N          |  v    v
             | +--------------------+    +---------+
             | | Message Processing |<-->| Routing |
             | +--------------------+    +---------+
             |  ^                          ^
             |  v                          v
-------------+------------------------------------ Underlay Interface
             | +--------+  +--------+
             | |GNUnet  |  |IP      |  ...
Connectivity | |Underlay|  |Underlay|
             | |Link    |  |Link    |
             | +--------+  +--------+

Figure 1: The R5N Architecture.

Specifics about the protocols of the underlays providing connectivity or the applications using the DHT are out of the scope of this document. However, we note that peers implementing disjoint sets of underlay protocols may experience difficulties communicating (unless other peers bridge the respective underlays). Similarly, peers that do not support a particular application will not be able to validate application-specific payloads and may thus be tricked into storing or forwarding corrupt blocks.

4. Application API

An implementation of this specification commonly exposes the two API procedures "GET" and "PUT". The following are non-normative examples of such APIs and their behaviour are detailed in order to give implementers a fuller picture of the protocol.

4.1. The GET procedure

A basic GET procedure may be exposed as:

GET(Query-Key, Block-Type) -> Results as List

The procedure typically takes at least two arguments to initiate a lookup:

QueryKey:
is the 512-bit key to look for in the DHT.
Block-Type:
is the type of block to look for, possibly "any".

The GET procedure may allow a set of optional parameters in order to control or modify the query:

Replication-Level:
is an integer which controls how many nearest peers the request should reach.
Flags:
is a 16-bit vector which indicates certain processing requirements for messages. Any combination of flags as defined in Section 9.1.1 may be specified.
eXtended-Query (XQuery):
is medatadata which may be required depending on the respective Block-Type. A Block-Type must define if the XQuery can or must be used and what the specific format of its contents should be. Extended queries are in general used to implement domain-specific filters. These might be particularly useful in combination with FindApproximate to add a well-defined filter by an application-specific distance. Regardless, the DHT does not define any particular semantics for an XQuery. See also Section 10.
Result-Filter:
is data for a Block-type-specific filter which allows applications to indicate results which are not relevant anymore to the caller (see Section 9.4.2).

The GET procedure should be implemented as an asynchronous operation that returns individual results as they are found in the DHT. It should terminate only once the application explicitly cancels the operation. A single result commonly consists of:

Block-Type:
is the desired type of block in the result.
Block-Data:
is the application-specific block payload. Contents are specific to the Block-Type.
Block-Expiration:
is the expiration time of the block. After this time, the result should no longer be used.
Key:
is the key under which the block was stored. This may be different from the key that was queried if the flag FindApproximate was set.
GET-Path:
is a signed path of the IDs of peers which the query traversed through the network. The DHT will try to make the path available if the RecordRoute flag was set by the application calling the PUT procedure. The reported path may have been silently truncated from the beginning.
PUT-Path:
is a signed path of the IDs of peers which the result message traversed. The DHT will try to make the path available if the RecordRoute flag was set for the GET procedure. The reported path may have been silently truncated from the beginning. As the block was cached by the node at the end of this path, this path is more likely to be stale compared to the GET-Path.

4.2. The PUT procedure

A PUT procedure may be exposed as:

PUT(Key, Block-Type, Block-Expiration, Block-Data)

The procedure typically takes at least four parameters:

Key:
is the key under which to store the block.
Block-Type:
is the type of the block to store.
Block-Expiration:
specifies when the block should expire.
Block-Data:
is the application-specific payload of the block to store.

The PUT procedure may allow a set of optional parameters in order to control or modify the query:

Replication-Level:
is an integer which controls how many nearest peers the request should reach.
Flags:
is a bit-vector which indicates certain processing requirements for messages. Any combination of flags as defined in Section 9.1.1 may be specified.

The PUT procedure does not necessarily yield any information.

5. Underlay

In the network underlay, a peer is addressable by traditional means out of scope of this document. For example, the peer may have a TCP/IP address, or a HTTPS endpoint. While the specific addressing options and mechanisms are out of scope for this document, it is necessary to define a universal addressing format in order to facilitate the distribution of connectivity information to other peers in the DHT overlay. This format is the "HELLO" Block (described in Section 10.2), which contains URIs. The scheme of each URI indicates which underlay understands the respective address given in the rest of the URI.

It is expected that the underlay provides basic mechanisms to manage peer connectivity and addressing. The required functionalities can be represented by the following API:

TRY_CONNECT(N, A)
A function which allows the local peer to attempt the establishment of a connection to another peer N using an address A. When the connection attempt is successful, information on the new peer is offered through the PEER_CONNECTED signal.
HOLD(P)
A function which tells the underlay to keep a hold on to a connection to a peer P. Underlays are usually limited in the number of active connections. With this function the DHT can indicate to the underlay which connections should preferably be preserved.
DROP(P)
A function which tells the underlay to drop the connection to a peer P. This function is only there for symmetry and used during the peer's shutdown to release all of the remaining HOLDs. As R5N always prefers the longest-lived connections, it would never drop an active connection that it has called HOLD() on before. Nevertheless, underlay implementations should not rely on this always being true. A call to DROP() also does not imply that the underlay must close the connection: it merely removes the preference to preserve the connection that was established by HOLD().
SEND(P, M)
A function that allows the local peer to send a protocol message M to a peer P.
L2NSE = ESTIMATE_NETWORK_SIZE()
A procedure that provides an estimate of the network size. The result, L2NSE, must be the base-2 logarithm of the estimated number of peers in the network. It is used by the routing algorithm. If the underlay does not support a protocol for network size estimation (such as cite paper NSE) the value MAY be provided as a configuration parameter to the implementation.

The above procedures are meant to be actively executed by the implementation as part of the peer-to-peer protocol. In addition, the underlay is expected to emit the following signals (usually implemented as callbacks) based on network events observed by the underlay implementation:

PEER_CONNECTED -> P
is a signal that allows the DHT to react to a newly connected peer P. Such an event triggers, for example, updates in the routing table and gossiping of HELLOs to that peer.
PEER_DISCONNECTED -> P
is a signal that allows the DHT to react to a recently disconnected peer. Such an event triggers, for example, updates in the routing table.
ADDRESS_ADDED -> A
The underlay signals indicates that an address A was added for our local peer and that henceforth the peer may be reachable under this address. This information is used to advertise connectivity information about the local peer to other peers. A must be a URI suitable for inclusion in a HELLO payload Section 10.2.
ADDRESS_DELETED -> A
This underlay signals indicates that an address A was removed from the set of addresses the local peer is possibly reachable under. Addresses must have been added before they may be deleted. This information is used to no longer advertise this address to other peers.
RECEIVE -> (P, M)
This signal informs the local peer that a protocol message M was received from a peer P.

These signals then drive updates of the routing table, local storage and message transmission.

6. Bootstrapping

Initially, the implementation depends upon either the Underlay providing at least one initial connection to a peer (signalled through PEER_CONNECTED), or the application/end-user providing at least one working HELLO to the DHT for bootstrapping. While details on how the first connection is established MAY depend on the specific implementation, this SHOULD usually be done by an out-of-band exchange of the information from a HELLO block. For this, section Section 6.1 specifies a URL format for encoding HELLO blocks as text strings which SHOULD be supported by implementations.

Regardless of how the initial connections are established, the peers resulting from these initial connections are subsequently stored in the routing table component Section 8.1.

Furthermore, the Underlay SHOULD provide the implementation with one or more addresses signalled through ADDRESS_ADDED. Zero addresses MAY be provided if a peer can only establish outgoing connections and is otherwise unreachable. The implementation periodically advertises all active addresses in a HELLO block Section 10.2.

In order to fill its routing table by finding close peers in the network, an implementation MUST now periodically send peer discovery messages Section 8.2.

The frequency of advertisement and peer discovery messages MAY be adapted according to network conditions, the set of already connected neighbors, the workload of the system and other factors which are at the discretion of the developer.

Any implementation encountering a HELLO GET request MUST respond with its own HELLO block except if that block is filtered by the request's result filter (see Section 9.4.2). Implementations MAY respond with additional valid HELLO blocks of other peers with keys closest to the key of the GET request. A HELLO block is "valid" if it is non-expired and is not excluded by the result filter. "closest" is defined by considering the distances between the request's key and the peer addresses of all of the valid HELLO blocks known at the local node.

6.1. HELLO URLs

The general format of a HELLO URL uses "gnunet://" as the scheme, followed by "hello/" for the name of the GNUnet subsystem, followed by "/"-separated values with the GNS Base32 encoding (FIXME: described here or reference GNS draft?) of the Peer ID, a Base32-encoded EdDSA signature, and an expiration time in seconds since the UNIX Epoch in decimal format. After this a "?" begins a list of key-value pairs where the key is the URI scheme of one of the peer's addresses and the value is the URL-escaped payload of the address URI without the "://".

For example, consider the following URL:

gnunet://hello/RH1M20EPK834M6MHZ72\
G3CMBSF3ECKNY4W0T9VAQP9Z7SZEM6Y3G/\
NGRTAH6RA04X467CGCH7M7CEXR5F9CV5HT\
ZFK0G9BWETY3CCE2QWGVT4WA7JN5M9HMWG\
60A00R71F1PJP8N5628EKGHHBAGA7M8JW3\
0/1647134480?udp=127.0.0.1%3A2086

FIXME: signature is invalid, should
maybe generate proper test vector.

Figure 2

It specifies that the peer with the ID "RH1M...6Y3G" is reachable via "udp" at 127.0.0.1 on port 2086 until 1647134480 seconds after the Epoch. Note that "udp" here is underspecified and just used as a simple example. In practice, the key (addr-name) MUST refer to a scheme supported by a DHT Underlay.

The general syntax of HELLO URLs specified using Augmented Backus-Naur Form (ABNF) of [RFC5234] is:

hello-URL = "gnunet://hello/" meta [ "?" addrs ]
meta = pid "/" sig "/" exp
pid = *bchar
sig = *bchar
exp = *DIGIT
addrs = addr *( "&" addr )
addr = addr-name "=" addr-value
addr-name = scheme
addr-value = *pchar
bchar = *(ALPHA / DIGIT)

Figure 3

'scheme' is defined in [RFC3986] in Section 3.1. 'pchar' is defined in [RFC3986], Appendix A.

7. Bloom Filters

R5N uses Bloom filters in several places. This section gives some general background on Bloom filters and defines functions on this data structure shared by the various use-cases in R5N.

A Bloom filter (BF) is a space-efficient probabilistic datastructure to test if an element is part of a set of elements. Elements are identified by an element ID. Since a BF is a probabilistic datastructure, it is possible to have false-positives: when asked if an element is in the set, the answer from a BF is either "no" or "maybe".

Bloom filters are defined as a string of L bits always initially empty, consisting only of zeroes. There are two functions which can be invoked on the Bloom filter: BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to be added to the Bloom filter or queried against the set. The mapping function M is defined as follows:

The element e is hashed using SHA-512. The resulting byte string is interpreted as a string of 16 32-bit integers in network byte order.

When adding an element to the Bloom filter bf using BF-SET(bf,e), each integer n of the mapping M(e) is interpreted as a bit offset n mod L within bf and set to 1.

When testing if an element may be in the Bloom filter bf using BF-TEST(bf,e), each bit offset n mod L within bf MUST have been set to 1. Otherwise, the element is not considered to be in the Bloom filter.

8. Routing

In order to select peers which are suitable destinations for routing messages, R5N uses a hybrid approach: Given an estimated network size N, the peer selection for the first N hops is random. After the initial N hops, peer selection follows an XOR-based peer distance calculation.

To enable routing, any R5N implementation must keep information about its current set of neighbors. Upon receiving a connection notification from the Underlay through PEER_CONNECTED, information on the new neighbor MUST be added, and similarly when a disconnect is indicated by the Underlay through PEER_DISCONNECTED the peer MUST be removed.

In order to achieve O(log n) routing performance, the data structure for managing neighbors and their metadata MUST be implemented using the k-buckets concept of [Kademlia] as defined in Section 8.1. Maintenance of the routing table (after bootstrapping) is described in Section 8.2.

Unlike [Kademlia], routing decisions in R5N are also influenced by a Bloom filter in the message that prevents routing loops. This data structure is discussed in Section 8.3. Section 8.4 describes the key functions provided on top of these data structures.

8.1. Routing Table

The routing table consists of an array of k-buckets. Each k-bucket contains a list of neighbors. The i-th k-bucket stores neighbors whose peer IDs are between distance 2^i and 2^(i+1) from the local peer. System constraints will typically force an implementation to impose some upper limit on the number of neighbors kept per k-bucket.

Implementations SHOULD try to keep at least 5 entries per k-bucket. Embedded systems that cannot manage this number of connections MAY use connection-level signalling to indicate that they are merely a client utilizing a DHT and not able to participate in routing. DHT peers receiving such connections MUST NOT include connections to such restricted systems in their k-buckets, thereby effectively excluding them when making routing decisions.

If a system hits constraints with respect to the number of active connections, an implementation MUST evict peers from those k-buckets with the largest number of neighbors. The eviction strategy MUST be to drop the shortest-lived connections first.

8.2. Peer Discovery

To build its routing table, a peer will send out requests asking for blocks of type HELLO using its own location as the key, but filtering all of its neighbors via the Bloom filter described in Section 9.4.2. These requests MUST use the FindApproximate and DemultiplexEverywhere flags. FindApproximate will ensure that other peers will reply with keys they merely consider close-enough, while DemultiplexEverywhere will cause each peer on the path to respond, which is likely to yield HELLOs of peers that are useful somewhere in the routing table.

To facilitate peer discovery, each peer MUST broadcast its own HELLO message to all peers in the routing table periodically. The specific frequency MAY depend on available bandwidth but SHOULD be a fraction of the expiration period. Whenever a peer receives such a HELLO message from another peer, it must cache it as long as that peer is in its routing table (or until the HELLO expires) and serve it in response to Peer Discovery requests. Details about the format of the HELLO message are given in Section 9.2.1.

8.3. Peer Bloom Filter

As DHT GetMessages and PutMessages traverse a random path through the network for the first N hops, it is essential that routing loops are avoided. In R5N, a Bloom filter is used as part of the routing metadata in messages. The Bloom filter is updates at each hop with the hops peer identity. For the next hop selection in both the random and the deterministic case, any peer which is in the Bloom filter for the respective message is not included in the peer selection process.

The peer Bloom filter is used to prevent circular routes. Any peer which is forwarding GetMessages or PutMessages (Section 9) adds its own peer ID to the peer Bloom filter. This allows other peers to (probabilistically) exclude already traversed peers when searching for the next hops in the routing table.

The peer Bloom filter is always a 128 byte field. The elements hashed into the Bloom filter are the 32 byte peer IDs. We note that the peer Bloom filter may exclude peers due to false-postive matches. This is acceptable as routing should nevertheless terminate (with high probability) in close vicinity of the key.

8.4. Routing Functions

Using the data structures described so far, the R5N routing component provides the following functions for message processing (Section 9):

GetDistance(A, B) -> Distance as Integer
This function calculates the binary XOR between A and B. The resulting distance is interpreted as an integer where the leftmost bit is the most significant bit.
SelectClosestpeer(K, B) -> N
This function selects the neighbor N from our routing table with the shortest XOR-distance to the key K. This means that for all other peers N' in the routing table GetDistance(N, K) < GetDistance(N',K). Peers with a positive test against the peer Bloom filter B are not considered.
SelectRandompeer(B) -> N
This function selects a random peer N from all neighbors. Peers with a positive test in the peer Bloom filter B are not considered.
Selectpeer(K, H, B) -> N
This function selects a neighbor N depending on the number of hops H parameter. If H < NETWORK_SIZE_ESTIMATE this function MUST return SelectRandompeer(B) and SelectClosestpeer(K, B) otherwise.
IsClosestpeer(N, K, B) -> true | false
This function checks if N is the closest peer for K (cf. SelectClosestpeer(K)). Peers with a positive test in the Bloom filter B are not considered.
ComputeOutDegree(REPL_LVL, HOPCOUNT, L2NSE) -> Number

This function computes the number of neighbors that a message should be forwarded to. The arguments are the desired replication level (REPL_LVL), the HOPCOUNT of the message so far, and the base-2 logarithm of the current network size estimate (L2NSE) as provided by the underlay. The result is the non-negative number of next hops to select. The following figure gives the pseudocode for computing the number of neighbors the peer should attempt to forward the message to.

function ComputeOutDegree(REPL_LVL, HOPCOUNT, L2NSE)
BEGIN
  if (HOPCOUNT > L2NSE * 4)
    return 0;
  if (HOPCOUNT > L2NSE * 2)
    return 1;
  if (0 = REPL_LEVL)
    REPL_LEVL = 1
  if (REPL_LEVEL > 16)
    REPL_LEVEL = 16
  RM1 = REPL_LEVEL - 1
  return 1 + (RM1 / (L2NSE + RM1 * HOPCOUNT))
Figure 4: Computing the number of next hops.

The above calculation may yield values that are not discrete. Hence, the result MUST be rounded probabilistically to the nearest discrete value, using the fraction as the probability for rounding up.

8.5. Pending Table

R5N performs stateful routing where the messages only carry the query hash and do not encode the ultimate source or destination of the request. Routing a request towards the key is doing hop-by-hop using the routing table and the query hash. The pending table is used to route responses back to the originator. In the pending table each peer primarily associates a query hash with the associated originator of the request. The pending table MUST store entries for the last MAX_RECENT requests the peer has encountered. To ensure that the peer does not run out of memory, information about older requests is discarded. The value of MAX_RECENT MAY be configurable and SHOULD be at least 128k.

For each entry in the pending table, the DHT MUST track not only the query key and the origin, but also the extended query, requested block type and flags, and the result filter. If the query did not provide a result filter, a fresh result filter MUST still be created to filter duplicate replies. Details of how a result filter works depend on the type, as described in Section 10.1.

When a second query from the same origin for the same query hash is received, the DHT MUST attempt to merge the new request with the state for the old request. If this is not possible, the existing result filter MUST be discarded and replaced with the result filter of the incoming message.

We note that for local applications, a fixed limit on the number of concurrent requests may be problematic. Hence, it is RECOMMENDED that implementations track requests from local applications separately and preserve the information until the application explicitly stops the request.

9. Message Processing

The implementation MUST listen for RECEIVE(P, M) signals from the Underlay and respond to the respective messages sent by the peer P. In the following, the wire formats of the messages and the required processing are detailed. Where required, the local peer's ID is referred to as SELF.

9.1. Message components

This section describes some data structures and fields shared by various message types.

9.1.1. Flags

Flags is a 16-bit vector representing binary options. Each flag is represented by a bit in the field starting from 0 as the rightmost bit to 15 as the leftmost bit.

0: DemultiplexEverywhere
This bit indicates that each peer along the way should process the request. If the bit is not set, intermediate peers only route the message and only peers which consider themselves closest to the key look for answers in their local storage for GetMessages and cache the block in their local storage for PutMessages and ResultMessages.
1: RecordRoute
This bit indicates to keep track of the path that the message takes in the P2P network.
2: FindApproximate
This bit allows results where the key does not match exactly.
3: Truncated
This is a special flag which is set if a peer truncated the path and thus the first hop on the path is given without a signature to enable checking of the next signature. MUST never be set in a query.
4-15: Reserved
The remaining bits are reserved for future use and MUST be set to 0 when initiating an operation. If non-zero bits are received, implementations MUST preserve these bits when forwarding messages.

9.1.2. Path Element

A Path Element represents a hop in the path a message has taken through the network. The wire format of a Path Element is illustrated in Figure 5.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE                    |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER ID                      |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 5: The Wire Format of a Path Element.

where:

SIGNATURE
is a 64 byte EdDSA signature using the current hop's private key affirming the previous and next hops.
PEER ID
is the EdDSA public key of the peer on the path.

An ordered list of Path Elements may be appended to any routed PutMessages or ResultMessages. The signature of a Path Element is created by the current hop after it made its routing decision identifiying the successor peer. The wire format of an example path from Peers A over B and C as it would be received by D in a PutMessage or ResultMessage is illustrated in Figure 6. A ResultMessage would indicate in the PATH_LEN field a length of 3. A PutMessage would indicate a length of 3 as the sum of PUTPATH_L and GETPATH_L fields.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE A                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER A                       |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE B                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER B                       |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE C                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER C                       |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE D                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 6: Example of a path from Peer A to Peer D.

A path may be truncated in which case the signature of the truncated Path Element is omitted leaving only the Peer ID required for the verification of the subsequent Path Element signature. Such a truncated path is indicated with the respective flag (Section 9.1.1). The Peer ID of the last Path Element is omitted as it must be that of the sender of the PutMesssage or ResultMessage. The wire format of a truncated example path from Peers B over C to D is illustrated in Figure 7. The wire format of an example path from Peers B over C as it would be received by D in a PutMessage or ResultMessage is illustrated in Figure 7. A ResultMessage would indicate in the PATH_LEN field a length of 1. A PutMessage would indicate a length of 1 as the sum of PUTPATH_L and GETPATH_L fields.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER B                       |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE C                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER C                       |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  SIGNATURE D                  |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 7: Example of a truncated path from Peer B to Peer D.

The SIGNATURE field in a Path Element covers a 64-bit contextualization header, the the block expiration, a hash of the block payload, as well as the predecessor peer ID and the peer ID of the successor that the peer making the signature is routing the message to. Thus, the signature made by SELF basically says that SELF received the block payload from PEER PREDECESSOR and has forwarded it to PEER SUCCESSOR. The wire format is illustrated in Figure 8.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|         SIZE          |       PURPOSE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  BLOCK HASH                   |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER PREDECESSOR             |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  PEER SUCCESSOR               |
|                  (32 byte)                    |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 8: The Wire Format of the Path Element for Signing.
SIZE
A 32-bit value containing the length of the signed data in bytes in network byte order. The length of the signed data MUST be 144 bytes.
PURPOSE
A 32-bit signature purpose flag. This field MUST be 6 (in network byte order).
EXPIRATION
denotes the absolute 64-bit expiration date of the block. In microseconds since midnight (0 hour), January 1, 1970 UTC in network byte order.
BLOCK HASH
a SHA-512 hash over the block payload.
PEER PREDECESSOR
the Peer ID of the previous hop. If the signing peer initiated the PUT, this field is set to all zeroes.
PEER SUCCESSOR
the Peer ID of the next hop (not of the signer).

9.2. HelloMessage

HelloMessages are used to inform neighbors of a peer about the sender's available addresses. The recipients use these messages to inform their respective Underlays about ways to sustain the connections and to generate HELLO blocks (see Section 10.2) to answer peer discovery queries from other peers. In contrast to a HELLO block, a HelloMessage does not contain the ID of the peer the address information is about: in a HelloMessage, the address information is always about the sender. Since the Underlay is responsible to determine the peer ID of the sender for all messages, it would thus be redundant to also include the peer ID in the HelloMessage.

9.2.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   | RESERVED  | URL_CTR   |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    SIGNATURE                  /
/                   (64 byte)                   |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    EXPIRATION                 |
+-----+-----+-----+-----+-----+-----+-----+-----+
/ ADDRESSES (variable length)                   /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 9: The HelloMessage Wire Format.

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. It must be set to the value 157 in network byte order.
RESERVED
is a 16-bit field that must be zero.
URL_CTR
is a 16-bit number that gives the total number of addresses encoded in the ADDRESSES field. In network byte order.
SIGNATURE
is a 64 byte EdDSA signature using the sender's private key affirming the information contained in the message. The signature is signing exactly the same data that is being signed in a HELLO block as described in Section 10.2.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. The value specified is microseconds since midnight (0 hour), January 1, 1970, but must be a multiple of one million (so that it can be represented in seconds in a HELLO URL). Stored in network byte order.
ADDRESSES
A sequence of exactly URL_CTR 0-terminated URIs in UTF-8 encoding representing addresses of this peer. Each URI must begin with a non-empty URI scheme terminated by "://" and followed by some non-empty Underlay- and scheme-specific address encoding.

9.2.2. Processing

Upon receiving a HelloMessage from a peer P an implementation MUST process it step by step as follows:

  1. If P is not in its routing table, the message is discarded.
  2. The signature is verified, including a check that the expiration time is in the future. If the signature is invalid, the message is discarded.
  3. The HELLO information is cached in the routing table until it expires, the peer is removed from the routing table, or the information is replaced by another message from the peer.

The address information about P should then also be made available to each respective Underlays to enable the Underlay to maintain optimal connectivity to the neighbor.

9.3. PutMessage

PutMessages are used to store information at other peers in the DHT.

9.3.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |         BTYPE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|   FLAGS   | HOPCOUNT  | REPL_LVL  | PATH_LEN  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    EXPIRATION                 |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   PEER_BF                     /
/                 (128 byte)                    |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  BLOCK_KEY                    /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
/       TRUNCATED ORIGIN (0 or 32 bytes)        /
+-----+-----+-----+-----+-----+-----+-----+-----+
/              PUTPATH (variable length)        /
+-----+-----+-----+-----+-----+-----+-----+-----+
/      LAST HOP SIGNATURE (0 or 64 bytes)       /
+-----+-----+-----+-----+-----+-----+-----+-----+
/              BLOCK (variable length)          /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 10: The PutMessage Wire Format.

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. It must be set to the value 146 in network byte order.
BTYPE
is a 32-bit block type. The block type indicates the content type of the payload. In network byte order.
FLAGS
is a 16-bit vector with binary options (see Section 9.1.1).
HOPCOUNT
is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL
is a 16-bit number indicating the desired replication level of the data. In network byte order.
PATH_LEN
is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optional, this value may be zero. In network byte order.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
PEER_BF
A peer Bloom filter to stop circular routes (see Section 8.3).
BLOCK_KEY
The key under which the PutMessage wants to store content under.
TRUNCATED ORIGIN
is only provided if the TRUNCATED flag is set in FLAGS. If present, this is the public key of the peer just before the first entry on the PUTPATH and the first peer on the PUTPATH is not the actual origin of the message. Thus, to verify the first signature on the PUTPATH, this public key must be used. Note that due to the truncation, this last hop cannot be verified to exist.
PUTPATH
the variable-length PUT path. The path consists of a list of PATH_LEN Path Elements.
LAST HOP SIGNATURE
is only provided if the RECORD ROUTE flag is set in FLAGS. If present, this is an EdDSA signature of the sender of this message (using the same format as the signatures in PUTPATH) affirming that the sender forwarded the message from the predecessor (all zeros if PATH_LEN is 0, otherwise the last peer in PUTPATH) to the target peer.
BLOCK
the variable-length block payload. The contents are determined by the BTYPE field. The length is determined by MSIZE minus the size of all of the other fields.

9.3.2. Processing

Upon receiving a PutMessage from a peer P an implementation MUST process it step by step as follows:

  1. The EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
  2. If the BTYPE is not supported by the implementation, no validation of the block payload is performed and processing continues at (5). Else, the block MUST be validated as defined in (3) and (4).
  3. The message is evaluated using the block validation functions matching the BTYPE. First, the client attempts to derive the key using the respective DeriveBlockKey procedure as described in Section 10.1. If a key can be derived and does not match, the message MUST be discarded.
  4. Next, the ValidateBlockStoreRequest procedure for the BTYPE as described in Section 10.1 is used to validate the block payload. If the block payload is invalid, the message MUST be discarded.
  5. The peer address of the sender peer P SHOULD be in PEER_BF. If not, the implementation MAY log an error, but MUST continue.
  6. If the RecordRoute flag is set in FLAGS, the local peer address MUST be appended to the PUTPATH of the message. If the flag is not set, the PATH_LEN MUST be set to zero.
  7. If the PATH_LEN is non-zero, the local peer SHOULD verify the signatures from the PUTPATH. Verification MAY involve checking all signatures or any random subset of the signatures. It is RECOMMENDED that peers adapt their behavior to available computational resources so as to not make signature verification a bottleneck. If an invalid signature is found, the PUTPATH MUST be truncated to only include the elements following the invalid signature.
  8. If the local peer is the closest peer (cf. IsClosestpeer(SELF, BLOCK_KEY)) or the DemultiplexEverywhere flag ist set, the message MUST be stored locally in the block storage.
  9. If the MTYPE of the message indicates a HELLO block, the peer MUST be considered for the local routing table if the respective k-bucket is not yet full. In this case, the local peer MUST try to establish a connection to the peer indicated in the HELLO block using the address information from the HELLO block. If a connection is established, the peer is added to the respective k-bucket of the routing table. Note that the k-bucket MUST be determined by the key computed using DeriveBlockKey and not by the QUERY_HASH.
  10. Given the value in REPL_LVL, HOPCOUNT and the result of IsClosestpeer(SELF, BLOCK_KEY) the number of peers to forward to MUST be calculated using ComputeOutDegree(). The implementation SHOULD select up to this number of peers to forward the message to. The implementation MAY forward to fewer or no peers in order to handle resource constraints such as limited bandwidth. Finally, the local peer address MUST be added to the PEER_BF before forwarding the message. For all peers with peer address P selected to forward the message to, SEND(P, PutMessage') is called. Here, PutMessage' is the original message with updated fields. In particular, HOPCOUNT MUST be incremented by 1.

9.4. GetMessage

GetMessages are used to request information from other peers in the DHT.

9.4.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |         BTYPE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|   FLAGS   |  HOPCOUNT | REPL_LVL  |  RF_SIZE  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                 PEER_BF                       /
/                 (128 byte)                    |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                 QUERY_HASH                    /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                 RESULT_FILTER                 /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                 XQUERY (variable length)      /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 11: The GetMessage Wire Format.

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. It must be set to the value 147 in network byte order.
BTYPE
is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
FLAGS
is a 16-bit vector with binary options (see Section 9.1.1).
HOPCOUNT
is a 16-bit number indicating how many hops this message has traversed to far. In network byte order.
REPL_LVL
is a 16-bit number indicating the desired replication level of the data. In network byte order.
RF_SIZE
is a 16-bit number indicating the length of the result filter RESULT_FILTER. In network byte order.
PEER_BF
A peer Bloom filter to stop circular routes (see Section 8.3).
QUERY_HASH
The query used to indicate what the key is under which the originator is looking for blocks with this request. The block type may use a different evaluation logic to determine applicable result blocks.
RESULT_FILTER
the variable-length result filter, described in Section 9.4.2.
XQUERY
the variable-length extended query. Optional.

9.4.2. Result Filter

The result filter is used to indicate to other peers which results are not of interest when processing a GetMessage (Section 9.4). Any peer which is processing GetMessages and has a result which matches the query key MUST check the result filter and only send a reply message if the result does not test positive under the result filter. Before forwarding the GetMessage, the result filter MUST be updated to filter out all results already returned by the local peer.

How a result filter is implemented depends on the block type as described in Section 10.1. Result filters may be probabilistic data structures. Thus, it is possible that a desireable result is filtered by a result filter because of a false-positive test.

How exactly a block result is added to a result filter MUST be specified as part of the definition of a block type.

9.4.3. Processing

Upon receiving a GetMessage from a peer an implementation MUST process it step by step as follows:

  1. The QUERY_KEY and XQUERY fields are validated against the requested BTYPE as defined by its respective ValidateBlockQuery procedure. If validation function yields REQUEST_INVALID, the message MUST be discarded. If the BTYPE is not supported, the message MUST be forwarded.
  2. The peer address of the sender peer P SHOULD be in the PEER_BF Bloom filter. If not, the implementation MAY log an error, but MUST continue.
  3. If the local peer is the closest peer (cf. IsClosestpeer (SELF, QueryHash)) or the DemultiplexEverywhere flag is set, a reply MUST be produced (if one is available) using the following steps:

    a)
    If BTYPE indicates a request for a HELLO block, the peer MUST consult the HELLOs it has cached for the peers in its routing table instead of the local block storage (while continuing to respect flags like DemultiplexEverywhere and FindApproximate).
    b)
    If FLAGS indicate a FindApproximate request, the peer SHOULD try to respond with the closest block it has that is not filtered by the RESULT_BF.
    c)
    Else, the peer MUST respond if it has a valid block that matches the key exactly and that is not filtered by the RESULT_BF.

    Any such resulting block MUST be encapsulated in a ResultMessage and SHOULD be transmitted to the neighbor from which the request was received. Implementations MAY drop messages if they are resource-constrained. However, ResultMessages SHOULD be given the highest priority among competing transmissions.

    If the BTYPE is supported and ValidateBlockReply for the given query has yielded a status of FILTER_LAST, processing MUST end and not continue with forwarding of the request to other peers.

  4. Given the value in REPL_LVL, the number of peers to forward to MUST be calculated using ComputeOutDegree(). If there is at least one peer to forward to, the implementation SHOULD select up to this number of peers to forward the message to. The implementation MAY forward to fewer or no peers in order to handle resource constraints such as bandwidth. The peer Bloom filter PEER_BF MUST be updated with the local peer address SELF. For all peers with peer address P' chosen to forward the message to, SEND(P', GetMessage') is called. Here, GetMessage' is the original message with updated fields. In particular, HOPCOUNT MUST be incremented by 1.

9.5. ResultMessage

ResultMessages are used to return information to other peers in the DHT.

9.5.1. Wire Format

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|  MSIZE    |   MTYPE   |        BTYPE          |
+-----+-----+-----+-----+-----+-----+-----+-----+
|  RESERVED |   FLAGS   | PUTPATH_L | GETPATH_L |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                  QUERY_HASH                   /
/                 (64 byte)                     |
+-----+-----+-----+-----+-----+-----+-----+-----+
/       TRUNCATED ORIGIN (0 or 32 bytes)        /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                  PUTPATH                      /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                  GETPATH                      /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
/      LAST HOP SIGNATURE (0 or 64 bytes)       /
+-----+-----+-----+-----+-----+-----+-----+-----+
/                  BLOCK                        /
/                 (variable length)             /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 12: The ResultMessage Wire Format

where:

MSIZE
denotes the size of this message in network byte order.
MTYPE
is the 16-bit message type. It must be set to the value 148 in network byte order.
BTYPE
is a 32-bit block type field. The block type indicates the content type of the payload. In network byte order.
RESERVED
is a 16-bit value. Implementations MUST set this value to zero when originating a result message. Implementations MUST forward this value unchanged even if it is non-zero.
FLAGS
is a 16-bit vector with binary options (see Section 9.1.1).
PUTPATH_L
is a 16-bit number indicating the length of the PUT path recorded in PUTPATH. As PUTPATH is optional, this value may be zero even if the message has traversed several peers. In network byte order.
GETPATH_L
is a 16-bit number indicating the length of the GET path recorded in GETPATH. As GETPATH is optional, this value may be zero even if the message has traversed several peers. In network byte order.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
QUERY_HASH
the query hash corresponding to the GetMessage which caused this reply message to be sent.
TRUNCATED ORIGIN
is only provided if the TRUNCATED flag is set in FLAGS. If present, this is the public key of the peer just before the first entry on the PUTPATH and the first peer on the PUTPATH is not the actual origin of the message. Thus, to verify the first signature on the PUTPATH, this public key must be used. Note that due to the truncation, this last hop cannot be verified to exist.
PUTPATH
the variable-length PUT path. The path consists of a list of PUTPATH_L Path Elements.
GETPATH
the variable-length PUT path. The path consists of a list of GETPATH_L Path Elements.
LAST HOP SIGNATURE
is only provided if the RECORD ROUTE flag is set in FLAGS. If present, this is an EdDSA signature of the sender of this message (using the same format as the signatures in PUTPATH) affirming that the sender forwarded the message from the predecessor (all zeros if PATH_LEN is 0, otherwise the last peer in PUTPATH) to the target peer.
BLOCK
the variable-length resource record data payload. The contents are defined by the respective type of the resource record.

9.5.2. Processing

Upon receiving a ResultMessage from a connected peer an implementation MUST process it step by step as follows:

  1. First, the EXPIRATION field is evaluated. If the message is expired, it MUST be discarded.
  2. If the BTYPE is supported, then the BLOCK MUST be validated against the requested BTYPE. To do this, the peer checks that the block is valid using ValidateBlockStoreRequest. If the result is BLOCK_INVALID, the message MUST be discarded.
  3. If the PUTPATH_L or the GETPATH_L are non-zero, the local peer SHOULD verify the signatures from the PUTPATH and the GETPATH. Verification MAY involve checking all signatures or any random subset of the signatures. It is RECOMMENDED that peers adapt their behavior to available computational resources so as to not make signature verification a bottleneck. If an invalid signature is found, the path MUST be truncated to only include the elements following the invalid signature. In particular, any invalid signature on the GETPATH will cause PUTPATH_L to be set to 0.
  4. The peer also attempts to compute the key using DeriveBlockKey. This may result in NONE. The result is used later. Note that even if a key was computed, it does not have to match the QUERY_HASH.
  5. If the MTYPE of the message indicates a HELLO block, the peer MUST be considered for the local routing table if the respective k-bucket is not yet full. In this case, the local peer MUST try to establish a connection to the peer indicated in the HELLO block using the address information from the HELLO block. If a connection is established, the peer is added to the respective k-bucket of the routing table. Note that the k-bucket MUST be determined by the key computed using DeriveBlockKey and not by the QUERY_HASH.
  6. If the QUERY_HASH of this ResultMessage is found in the list of pending local or remote queries, then for each matching pending query:

    a)
    If the FindApproximate flag was not set in the query and the BTYPE allowed the implementation to compute the key from the block, the computed key must exactly match the QUERY_HASH, otherwise the result does not match the pending query and processing continues with the next pending query.
    b)
    If the BTYPE is supported, result block MUST be validated against the specific query using the respective FilterBlockResult function. This function MUST update the result filter if a result is returned to the originator of the query.
    c)
    If the BTYPE is not supported, filtering of exact duplicate replies MUST still be performed before forwarding the reply. Such duplicate filtering MAY be implemented probabilistically, for example using a Bloom filter. The result of this duplicate filtering is always either FILTER_MORE or FILTER_DUPLICATE.
    d)
    If the RecordRoute flag is set in FLAGS, the local peer address MUST be appended to the GETPATH of the message and the respective signature MUST be set using the query origin as the PEER SUCCESSOR and the response origin as the PEER PREDECESSOR. If the flag is not set, the GETPATH_L and PUTPATH_L MUST be set to zero when forwarding the result.

    If the result is either FILTER_MORE or FILTER_LAST, the result is forwarded to the origin of the query. If the result was FILTER_LAST, the query is removed from the list of pending queries.

  7. Finally, the implementation MAY choose to cache data from ResultMessages.

10. Blocks

This section describes various considerations R5N implementations must consider with respect to blocks. Specifically, implementations SHOULD be able to validate and persist blocks. Implementations MAY not support validation for all types of blocks. On some devices, storing blocks MAY also be impossible due to lack of storage capacity.

Applications can and should define their own block types. The block type determines the format and handling of the block payload by peers in PutMessages and ResultMessages. Block types MUST be registered with GANA (see Section 13.1).

10.1. Block Operations

Block validation may be necessary for all types of DHT messages. To enable these validations, any block type specification MUST define the following functions:

ValidateBlockQuery(Key, XQuery) -> RequestEvaluationResult

is used to evaluate the request for a block as part of GetMessage processing. Here, the block payload is unkown, but if possible the XQuery and Key SHOULD be verified. Possible values for the RequestEvaluationResult are:

REQUEST_VALID
Query is valid.
REQUEST_INVALID
Query format does not match block type. For example, a mandatory XQuery was not provided, or of the size of the XQuery is not appropriate for the block type.
DeriveBlockKey(Block) -> Key | NONE
is used to synthesize the block key from the block payload as part of PutMessage and ResultMessage processing. The special return value of NONE implies that this block type does not permit deriving the key from the block. A Key may be returned for a block that is ill-formed.
ValidateBlockStoreRequest(Block) -> BlockEvaluationResult

is used to evaluate a block payload as part of PutMessage and ResultMessage processing. Possible values for the BlockEvaluationResult are:

BLOCK_VALID
Block is valid.
BLOCK_INVALID
Block payload does not match the block type.
SetupResultFilter(FilterSize, Mutator) -> RF
is used to setup an empty result filter. The arguments are the set of results that must be filtered at the initiator, and a MUTATOR value which MAY be used to deterministically re-randomize probabilistic data structures. The specification MUST also include the wire format for BF.
FilterResult(Block, Key, RF, XQuery) -> (FilterEvaluationResult, RF')

is used to filter results against specific queries. This function does not check the validity of Block itself or that it matches the given key, as this must have been checked earlier. Thus, locally stored blocks from previously observed ResultMessages and PutMessages use this function to perform filtering based on the request parameters of a particular GET operation. Possible values for the FilterEvaluationResult are:

FILTER_MORE
Valid result, and there may be more.
FILTER_LAST
Last possible valid result.
FILTER_DUPLICATE
Valid result, but duplicate (was filtered by the result filter).
FILTER_IRRELEVANT
Block does not satisfy the constraints imposed by the XQuery.

If the main evaluation result is FILTER_MORE, the function also returns and updated result filter where the block is added to the set of filtered replies. An implementation is not expected to actually differenciate between the FILTER_DUPLICATE and FILTER_IRRELEVANT return values: in both cases the block is ignored for this query.

10.2. HELLO Blocks

For bootstrapping and peer discovery, the DHT implementation uses its own block type called "HELLO". HELLO blocks are the only type of block that MUST be supported by every R5N implementation. A block with this block type contains the peer ID of the peer that published the HELLO together with a set of addresses of this peer. The key of a HELLO block is the SHA-512 of the peer ID and thus the peer's address in the DHT.

The HELLO block type wire format is illustrated in Figure 13. A query for block of type HELLO MUST NOT include extended query data (XQuery). Any implementation encountering a request for a HELLO with non-empty XQuery data MUST consider the request invalid and ignore it.

0   8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|                     PEER-ID                   |
|                    (32 byte)                  |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                    SIGNATURE                  |
|                    (64 byte)                  |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
/                   ADDRESSES                   /
/               (variable length)               /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 13: The HELLO Block Format.
PEER-ID
is the Peer-ID of the peer which has generated this HELLO.
EXPIRATION
denotes the absolute 64-bit expiration date of the content. The value specified is microseconds since midnight (0 hour), January 1, 1970, but must be a multiple of one million (so that it can be represented in seconds in a HELLO URL). Stored in network byte order.
ADDRESSES

is a list of UTF-8 [RFC3629] URIs [RFC3986] which can be used as addresses to contact the peer. The strings MUST be 0-terminated. The set of URIs MAY be empty. An example of an addressing scheme used throughout this document is "r5n+ip+tcp", which refers to a standard TCP/IP socket connection. The "hier"-part of the URI must provide a suitable address for the given addressing scheme. The following is a non-normative example of address strings:

r5n+ip+udp://1.2.3.4:6789 \
gnunet+tcp://12.3.4.5/ \
Figure 14: Example Address URIs.
SIGNATURE

is the signature of the HELLO. It covers a 64-bit pseudo header conceptually prefixed to the block. The pseudo header includes the expiration, signature purpose and a hash over the addresses. The wire format is illustrated in Figure 15.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|         SIZE          |       PURPOSE         |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   EXPIRATION                  |
+-----+-----+-----+-----+-----+-----+-----+-----+
|                   H_ADDRS                     |
|                  (64 byte)                    |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
|                                               |
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 15: The Wire Format of the HELLO for Signing.
SIZE
A 32-bit value containing the length of the signed data in bytes in network byte order. The length of the signed data MUST be 80 bytes.
PURPOSE
A 32-bit signature purpose flag. This field MUST be 7 (in network byte order).
EXPIRATION
denotes the absolute 64-bit expiration date of the HELLO. In microseconds since midnight (0 hour), January 1, 1970 in network byte order.
H_ADDRS
a SHA-512 hash over the addresses in the HELLO. H_ADDRS is generated over the ADDRESSES field as provided in the HELLO block using SHA-512 [RFC4634].

The HELLO block functions MUST be implemented as follows:

ValidateBlockQuery(Key, XQuery) -> RequestEvaluationResult
To validate a block query for a HELLO is to simply check that the XQuery is empty. If it is empty, REQUEST_VALID ist returned. Otherwise, REQUEST_INVALID.
DeriveBlockKey(Block) -> Key | NONE
To derive a block key for a HELLO is to simply hash the peer ID from the HELLO. The result of this function is always: FIXME what hash
ValidateBlockStoreRequest(Block) -> BlockEvaluationResult
To validate a block store request is to verify the EdDSA SIGNATURE over the hashed ADDRESSES against the public key from the peer ID field. If the signature is valid BLOCK_VALID is returned. Otherwise BLOCK_INVALID.
SetupResultFilter(FilterSize, Mutator) -> RF

The RESULT_FILTER for HELLO blocks is implemented using a Bloom filter.

0     8     16    24    32    40    48    56
+-----+-----+-----+-----+-----+-----+-----+-----+
|        MUTATOR        |  HELLO_BF             /
+-----+-----+-----+-----+  (variable length)    /
/                                               /
+-----+-----+-----+-----+-----+-----+-----+-----+
Figure 16: The HELLO Block Result Filter.

where:

MUTATOR
The 32-bit mutator for the result filter.
HELLO_BF
The K-value for the HELLO_BF Bloom filter is always 16. The size S of the Bloom filter in bytes depends on the number of elements F known to be filtered at the initiator. If F is zero, the size S is just 8 (bytes). Otherwise, S is set to the minimum of 215 and the lowest power of 2 that is strictly larger than K*F/4 (in bytes). The wire format of HELLO_BF is the resulting byte array. In particular, K is never transmitted.

R5N HELLO blocks use a MUTATOR value to additionally "randomize" the computation of the bloom filter while remaining deterministic across peers. The 32-bit MUTATOR value is set by the peer initiating the GET request, and changed every time the GET request is repeated by the initiator. Peers forwarding GET requests MUST not change the mutator value included in the RESULT_FILTER as they might not be able to recalculate the result filter with a different MUTATOR value.

Consequently, repeated requests have statistically independent probabilities of creating false-positives in a result filter. Thus, even if for one request a result filter may exclude a result as a false-positive match, subsequent requests are likely to not have the same false-positives.

HELLO result filters can be merged if the Bloom filters have the same size and MUTATOR by setting all bits to 1 that are set in either Bloom filter. This is done whenever a peer receives a query with the same MUTATOR, predecessor and Bloom filter size.

FilterResult(Block, Key, RF, XQuery) -> (FilterEvaluationResult, RF')
To filter results of HELLO blocks using the Bloom filter, the H_ADDRS field (as computed using SHA-512 over the ADDRESSES) and XORed with the SHA-512 hash of the MUTATOR (in network byte order). The resulting value is then used when hashing into the Bloom filter as described in Section 7. Consequently, HELLOs with completely identical sets of addresses will be filtered and FILTER_DUPLICATE is returned. Any small variation in the set of addresses will cause the block to no longer be filtered (with high probability) and FILTER_MORE is returned.

10.3. Persistence

An implementation SHOULD provide a local persistence mechanism for blocks. Embedded systems that lack storage capability MAY use connection-level signalling to indicate that they are merely a client utilizing a DHT and are not able to participate with storage. The local storage MUST provide the following functionality:

Store(Key, Block)
Stores a block under the specified key. If an block with identical payload exists already under the same key, the meta data should be set to the maximum expiration time of both blocks and use the corresponding PUTPATH (and if applicable TRUNCATED ORIGIN) of that version of the block.
Lookup(Key) -> List of Blocks
Retrieves blocks stored under the specified key.
LookupApproximate(Key) -> List of Blocks
Retrieves the blocks stored under the specified key and any blocks under keys close to the specified key, in order of decreasing proximity.

10.3.1. Approximate Search Considerations

Over time a peer may accumulate a significant number of blocks which are stored locally in the persistence layer. Due to the expected high number of blocks, the method to retrieve blocks close to the specified lookup key in the LookupApproximate API must be implemented with care with respect to efficiency.

It is RECOMMENDED to limit the number of results from the LookupApproximate procedure to a result size which is easily manageable by the local system.

In order to efficiently find a suitable result set, the implementation SHOULD follow the following procedure:

  1. Sort all blocks by the block key in ascending (decending) order. The block keys are interpreted as integer.
  2. Alternatingly select a block with a key larger and smaller from the sortings. The resulting set is sorted by XOR distance. The selection process continues until the upper bound for the result set is reached and both sortings do not yield any closer blocks.

An implementation MAY decide to use a custom algorithm in order to find the closest blocks in the local storage. But, especially for more primitive approaches, such as only comparing XOR distances for all blocks in the storage, the procedure may become ineffective for large storages.

10.3.2. Caching Strategy Considerations

An implementation MUST implement an eviction strategy for blocks stored in the block storage layer.

In order to ensure the freshness of blocks, an implementation MUST evict expired blocks in favor of new blocks.

An implementation MAY preserve blocks which are often requested. This approach can be expensive as it requires the implementation to keep track of how often a block is requested.

An implementation MAY preserve blocks which are close to the local peer ID.

An implementation MAY provide configurable storage quotas and adapt its eviction strategy based on the current storage size or other constrained resources.

11. Security Considerations

If an upper bound to the maximum number of neighbors in a k-bucket is reached, the implementation MUST prefer to preserve the oldest working connections instead of new connections. This makes Sybil attacks less effective as an adversary would have to invest more resources over time to mount an effective attack.

The ComputeOutDegree function limits the REPL_LVL to a maximum of 16. This imposes an upper limit on bandwidth amplification an attacker may achieve for a given network size and topology.

11.1. Approximate Result Filtering

When a FindApproximate request is encountered, a peer will try to respond with the closest block it has that is not filtered by the result bloom filter. Implementations MUST ensure that the cost of evaluating any such query is reasonably small. For example, implementations MAY consider to avoid an exhaustive search of their database. Not doing so can lead to denial of service attacks as there could be cases where too many local results are filtered by the result filter.

12. IANA Considerations

IANA maintains a registry called the "Uniform Resource Identifier (URI) Schemes" registry.

12.1. GNUnet URI Scheme Registration

IANA maintains the "Uniform Resource Identifier (URI) Schemes" registry. The registry should be updated to include an entry for the 'gnunet' URI scheme. IANA is requested to update that entry to reference this document when published as an RFC.

12.2. R5N URI Scheme Registration

IANA maintains the "Uniform Resource Identifier (URI) Schemes" registry. The registry should be updated to include an entry for the 'r5n+udp+ip' URI scheme. IANA is requested to update that entry to reference this document when published as an RFC.

13. GANA Considerations

13.1. Block Type Registry

GANA [GANA] is requested to create a "DHT Block Types" registry. The registry shall record for each entry:

  • Name: The name of the block type (case-insensitive ASCII string, restricted to alphanumeric characters
  • Number: 32-bit
  • Comment: Optionally, a brief English text describing the purpose of the block type (in UTF-8)
  • Contact: Optionally, the contact information of a person to contact for further information
  • References: Optionally, references describing the record type (such as an RFC)

The registration policy for this sub-registry is "First Come First Served", as described in [RFC8126]. GANA is requested to populate this registry as follows:

Number| Name   | Contact | References | Description
------+--------+---------+------------+-------------------------
0       ANY      N/A       [This.I-D]   Reserved
7       HELLO    N/A       [This.I-D]   Type of a block that contains
                                        a HELLO for a peer
11      GNS      N/A       GNS          Block for storing record data
Figure 17: The Block Type Registry.

13.2. GNUnet URI schema Subregistry

GANA [GANA] is requested to create a "gnunet://" sub-registry. The registry shall record for each entry:

  • Name: The name of the subsystem (case-insensitive ASCII string, restricted to alphanumeric characters
  • Comment: Optionally, a brief English text describing the purpose of the subsystem (in UTF-8)
  • Contact: Optionally, the contact information of a person to contact for further information
  • References: Optionally, references describing the syntax of the URL (such as an RFC or LSD)

The registration policy for this sub-registry is "First Come First Served", as described in [RFC8126]. GANA is requested to populate this registry as follows:

Name           | Contact | References | Description
---------------+---------+------------+-------------------------
HELLO            N/A       [This.I-D]   How to contact a peer.
ADDRESS          N/A       N/A          Network address.
Figure 18: GNUnet scheme Subregistry.

13.3. GNUnet Signature Purpose Registry

GANA is requested to amend the "GNUnet Signature Purpose" registry as follows:

Purpose | Name            | References | Description
--------+-----------------+------------+---------------
6         DHT PATH Element  [This.I-D]   DHT message routing data
7         HELLO Payload     [This.I-D]   Peer contact information

Figure 19: The Signature Purpose Registry Entries.

13.4. GNUnet Message Type Registry

GANA is requested to amend the "GNUnet Message Type" registry as follows:

Type    | Name            | References | Description
--------+-----------------+------------+---------------
146       DHT PUT          [This.I-D]    Store information in DHT
147       DHT GET          [This.I-D]    Request information from DHT
148       DHT RESULT       [This.I-D]    Return information from DHT
157       HELLO Message    [This.I-D]    Peer contact information

Figure 20: The Message Type Registry Entries.

14. Test Vectors

15. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC3629]
Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, , <https://www.rfc-editor.org/info/rfc3629>.
[RFC3986]
Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, DOI 10.17487/RFC3986, , <https://www.rfc-editor.org/info/rfc3986>.
[RFC4634]
Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms (SHA and HMAC-SHA)", RFC 4634, DOI 10.17487/RFC4634, , <https://www.rfc-editor.org/info/rfc4634>.
[RFC5234]
Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, , <https://www.rfc-editor.org/info/rfc5234>.
[RFC6940]
Jennings, C., Lowekamp, B., Ed., Rescorla, E., Baset, S., and H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) Base Protocol", RFC 6940, DOI 10.17487/RFC6940, , <https://www.rfc-editor.org/info/rfc6940>.
[RFC8126]
Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, , <https://www.rfc-editor.org/info/rfc8126>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[ed25519]
Bernstein, D., Duif, N., Lange, T., Schwabe, P., and B. Yang, "High-Speed High-Security Signatures", , <http://link.springer.com/chapter/10.1007/978-3-642-23951-9_9>.
[GANA]
GNUnet e.V., "GNUnet Assigned Numbers Authority (GANA)", , <https://gana.gnunet.org/>.

16. Informative References

[R5N]
Evans, N. S. and C. Grothoff, "R5N: Randomized recursive routing for restricted-route networks", , <https://doi.org/10.1109/ICNSS.2011.6060022>.
[Kademlia]
Maymounkov, P. and D. Mazieres, "Kademlia: A peer-to-peer information system based on the xor metric.", , <http://css.csail.mit.edu/6.824/2014/papers/kademlia.pdf>.
[cadet]
Polot, B. and C. Grothoff, "CADET: Confidential ad-hoc decentralized end-to-end transport", , <https://doi.org/10.1109/MedHocNet.2014.6849107>.
[I-D.draft-schanzen-gns]
Schanzenbach, M., Grothoff, C., and B. Fix, "The GNU Name System", , <https://datatracker.ietf.org/doc/draft-schanzen-gns/>.

Authors' Addresses

Martin Schanzenbach
GNUnet e.V.
Boltzmannstrasse 3
85748 Garching
Germany
Christian Grothoff
Berner Fachhochschule
Hoeheweg 80
CH-2501 Biel/Bienne
Switzerland
Bernd Fix
GNUnet e.V.
Boltzmannstrasse 3
85748 Garching
Germany