rfc9616v2.txt | rfc9616.txt | |||
---|---|---|---|---|
skipping to change at line 73 ¶ | skipping to change at line 73 ¶ | |||
7. IANA Considerations | 7. IANA Considerations | |||
8. Security Considerations | 8. Security Considerations | |||
9. References | 9. References | |||
9.1. Normative References | 9.1. Normative References | |||
9.2. Informative References | 9.2. Informative References | |||
Acknowledgements | Acknowledgements | |||
Authors' Addresses | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
"The Babel Routing Protocol" [RFC8966] does not mandate a specific | The Babel routing protocol [RFC8966] does not mandate a specific | |||
algorithm for computing metrics; existing implementations use a | algorithm for computing metrics; existing implementations use a | |||
packet-loss-based metric on wireless links and a simple hop-count | packet-loss-based metric on wireless links and a simple hop-count | |||
metric on all other types of links. While this strategy works | metric on all other types of links. While this strategy works | |||
reasonably well in many networks, it fails to select reasonable | reasonably well in many networks, it fails to select reasonable | |||
routes in some topologies involving tunnels or VPNs. | routes in some topologies involving tunnels or VPNs. | |||
+------------+ | +------------+ | |||
| A (Paris) +---------------+ | | A (Paris) +---------------+ | |||
+------------+ \ | +------------+ \ | |||
/ \ | / \ | |||
skipping to change at line 99 ¶ | skipping to change at line 99 ¶ | |||
\ / | \ / | |||
\ / | \ / | |||
\ / | \ / | |||
+------------+ / | +------------+ / | |||
| D (Paris) +---------------+ | | D (Paris) +---------------+ | |||
+------------+ | +------------+ | |||
Figure 1: Four Routers in a Diamond Topology | Figure 1: Four Routers in a Diamond Topology | |||
For example, consider the topology described in Figure 1, with three | For example, consider the topology described in Figure 1, with three | |||
routers (A, B, and D) located in Paris and a fourth router (C) | routers A, B, and D located in Paris and a fourth router C located in | |||
located in Tokyo, connected through tunnels in a diamond topology. | Tokyo, connected through tunnels in a diamond topology. When routing | |||
When routing traffic from A to D, it is obviously preferable to use | traffic from A to D, it is obviously preferable to use the local | |||
the local route through B as this is likely to provide better service | route through B as this is likely to provide better service quality | |||
quality and lower monetary cost than the distant route through C. | and lower monetary cost than the distant route through C. However, | |||
However, the existing implementations of Babel consider both routes | the existing implementations of Babel consider both routes as having | |||
as having the same metric; therefore, they will route the traffic | the same metric; therefore, they will route the traffic through C in | |||
through C in roughly half the cases. | roughly half the cases. | |||
In the first part of this document (Section 3), we specify an | In the first part of this document (Section 3), we specify an | |||
extension to the Babel routing protocol that produces a sequence of | extension to the Babel routing protocol that produces a sequence of | |||
accurate measurements of the round-trip time (RTT) between two Babel | accurate measurements of the round-trip time (RTT) between two Babel | |||
neighbours. These measurements are not directly usable as an input | neighbours. These measurements are not directly usable as an input | |||
to Babel's route-selection procedure since they tend to be noisy and | to Babel's route selection procedure since they tend to be noisy and | |||
to cause a negative feedback loop, which might give rise to frequent | to cause a negative feedback loop, which might give rise to frequent | |||
oscillations. In the second part (Section 4), we define an algorithm | oscillations. In the second part (Section 4), we define an algorithm | |||
that maps the sequence of RTT samples to a link cost that can be used | that maps the sequence of RTT samples to a link cost that can be used | |||
for route selection. | for route selection. | |||
1.1. Applicability | 1.1. Applicability | |||
The extension defined in Section 3 provides a sequence of accurate, | The extension defined in Section 3 provides a sequence of accurate | |||
but potentially noisy, RTT samples. Since the RTT is a symmetric | but potentially noisy RTT samples. Since the RTT is a symmetric | |||
measure of delay, this protocol is only applicable in environments | measure of delay, this protocol is only applicable in environments | |||
where the symmetric delay is a good predictor of whether a link | where the symmetric delay is a good predictor of whether a link | |||
should be taken by routing traffic, which might not necessarily be | should be taken by routing traffic, which might not necessarily be | |||
the case in networks built over exotic link technologies. | the case in networks built over exotic link technologies. | |||
The extension makes minimal requirements on the nodes. In | The extension makes minimal requirements on the nodes. In | |||
particular, it does not assume synchronised clocks; it only requires | particular, it does not assume synchronised clocks, and only requires | |||
that clock drift be negligible during the time interval between two | that clock drift be negligible during the time interval between two | |||
Hello TLVs. Since that is on the order of a few seconds, this | Hello TLVs. Since that is on the order of a few seconds, this | |||
requirement is met even with cheap crystal oscillators, such as the | requirement is met even with cheap crystal oscillators, such as the | |||
ones used in consumer electronics. | ones used in consumer electronics. | |||
The algorithm defined in Section 4 is based on a number of | The algorithm defined in Section 4 depends on a number of assumptions | |||
assumptions about the network. The assumption with the most severe | about the network. The assumption with the most severe consequences | |||
consequences is that all links below a certain RTT (rtt-min in | is that all links below a certain RTT (rtt-min in Section 4.2) can be | |||
Section 4.2) can be grouped in a single category of "good" links. | grouped in a single category of "good" links. While this is the case | |||
While this is the case in wide-area overlay networks, it makes the | in wide-area overlay networks, it makes the algorithm inapplicable in | |||
algorithm inapplicable in networks where distinguishing between low- | networks where distinguishing between low-latency links is important. | |||
latency links is important. | ||||
There are other assumptions, but they are less likely to limit the | There are other assumptions, but they are less likely to limit the | |||
algorithm's applicability. The algorithm assumes that all links | algorithm's applicability. The algorithm assumes that all links | |||
above a certain RTT (rtt-max in Section 4.2) are equally bad, and | above a certain RTT (rtt-max in Section 4.2) are equally bad, and | |||
they will only be used as a last resort. In addition, in order to | they will only be used as a last resort. In addition, in order to | |||
avoid oscillations, the algorithm is designed to react slowly to RTT | avoid oscillations, the algorithm is designed to react slowly to RTT | |||
variations, thus causing suboptimal routing for seconds or even | variations, thus causing suboptimal routing for seconds or even | |||
minutes after an RTT change. While this is a desirable property in | minutes after an RTT change; while this is a desirable property in | |||
fixed networks, as it avoid excessive route oscillations, it might be | fixed networks, as it avoid excessive route oscillations, it might be | |||
an issue with networks with high rates of node mobility. | an issue with networks with high rates of node mobility. | |||
2. Specification of Requirements | 2. Specification of Requirements | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
skipping to change at line 184 ¶ | skipping to change at line 183 ¶ | |||
* the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according | * the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according | |||
to the neighbour's clock; | to the neighbour's clock; | |||
* the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according | * the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according | |||
to the local clock. | to the local clock. | |||
Both values are initially undefined. | Both values are initially undefined. | |||
3.2. Protocol Operation | 3.2. Protocol Operation | |||
The RTT to a neighbour is estimated using an algorithm by Mills | The RTT to a neighbour is estimated using an algorithm due to Mills | |||
[RFC891], originally developed for the HELLO routing protocol and | [RFC891], originally developed for the HELLO routing protocol and | |||
later used in NTP [RFC5905]. | later used in NTP [RFC5905]. | |||
A Babel speaker periodically sends Hello messages to its neighbours | A Babel speaker periodically sends Hello messages to its neighbours | |||
(Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a | (Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a | |||
set of IHU ("I Heard You") messages, at most one per neighbour | set of IHU ("I Heard You") messages, at most one per neighbour | |||
(Section 3.4.2 of [RFC8966]). | (Section 3.4.2 of [RFC8966]). | |||
A B | A B | |||
| | | | | | |||
skipping to change at line 219 ¶ | skipping to change at line 218 ¶ | |||
| / | | | / | | |||
| / | Hello(t2') | | / | Hello(t2') | |||
| / | IHU(t1, t1') | | / | IHU(t1, t1') | |||
|/ | | |/ | | |||
t2 + | | t2 + | | |||
| | | | | | |||
v v | v v | |||
Figure 2: Mills' Algorithm | Figure 2: Mills' Algorithm | |||
In order to enable the computation of RTTs, node A MUST include, in | In order to enable the computation of RTTs, a node A MUST include, in | |||
every Hello that it sends, a timestamp t1 (according to A's local | every Hello that it sends, a timestamp t1 (according to A's local | |||
clock), as illustrated in Figure 2. When node B receives A's | clock), as illustrated in Figure 2. When a node B receives A's | |||
timestamped Hello, it computes the time t1' at which the Hello was | timestamped Hello, it computes the time t1' at which the Hello was | |||
received (according to B's local clock). It then MUST record the | received (according to B's local clock). It then MUST record the | |||
value t1 in the Origin Timestamp field of the Neighbour Table entry | value t1 in the Origin Timestamp field of the Neighbour Table entry | |||
corresponding to A and the value t1' in the Receive Timestamp field | corresponding to A and the value t1' in the Receive Timestamp field | |||
of the Neighbour Table entry. | of the Neighbour Table entry. | |||
When B sends an IHU to A, it checks whether both timestamps are | When B sends an IHU to A, it checks whether both timestamps are | |||
defined in the Neighbour Table. If that is the case, then it MUST | defined in the Neighbour Table. If that is the case, then it MUST | |||
ensure that its IHU TLV is sent in a packet that also contains a | ensure that its IHU TLV is sent in a packet that also contains a | |||
timestamped Hello TLV (either a normally scheduled Hello or an | timestamped Hello TLV (either a normally scheduled Hello or an | |||
unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include | unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include | |||
in the IHU both the Origin Timestamp and the Receive Timestamp stored | in the IHU both the Origin Timestamp and the Receive Timestamp stored | |||
in the Neighbour Table. | in the Neighbour Table. | |||
Upon receiving B's packet, A computes the time t2 (according to its | Upon receiving B's packet, A computes the time t2 (according to its | |||
local clock) at which it was received. Then, A MUST verify that it | local clock) at which it was received. Node A MUST then verify that | |||
contains both a Hello TLV with timestamp t2' and an IHU TLV with two | it contains both a Hello TLV with timestamp t2' and an IHU TLV with | |||
timestamps: t1 and t1'. If that is the case, A computes the value: | two timestamps t1 and t1'. If that is the case, A computes the | |||
value: | ||||
RTT = (t2 - t1) - (t2' - t1') | RTT = (t2 - t1) - (t2' - t1') | |||
(where all computations are done modulo 2^32), which is a measurement | (where all computations are done modulo 2^32), which is a measurement | |||
of the RTT between A and B. (A then stores the values t2' and t2 in | of the RTT between A and B. (A then stores the values t2' and t2 in | |||
its Neighbour Table, as B did before.) | its Neighbour Table, as B did before.) | |||
This algorithm has a number of desirable properties: | This algorithm has a number of desirable properties: | |||
1. The algorithm is symmetric. A and B use the same procedures for | 1. The algorithm is symmetric. A and B use the same procedures for | |||
timestamping packets and computing RTT samples; both nodes | timestamping packets and computing RTT samples: both nodes | |||
produce one RTT sample for each received (Hello, IHU) pair. | produce one RTT sample for each received (Hello, IHU) pair. | |||
2. Since there is no requirement that t1' and t2' be equal, the | 2. Since there is no requirement that t1' and t2' be equal, the | |||
protocol is asynchronous: the only change to Babel's message | protocol is asynchronous: the only change to Babel's message | |||
scheduling is the requirement that a packet containing an IHU | scheduling is the requirement that a packet containing an IHU | |||
also contain a Hello. | also contain a Hello. | |||
3. Since the algorithm only ever computes differences of timestamps | 3. Since the algorithm only ever computes differences of timestamps | |||
according to a single clock, it does not require synchronised | according to a single clock, it does not require synchronised | |||
clocks. | clocks. | |||
skipping to change at line 337 ¶ | skipping to change at line 337 ¶ | |||
The protocol is designed to survive the clock being reset when a node | The protocol is designed to survive the clock being reset when a node | |||
reboots; on POSIX systems, this makes it possible to use the | reboots; on POSIX systems, this makes it possible to use the | |||
CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC | CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC | |||
is not available, CLOCK_REALTIME may be used, since the protocol is | is not available, CLOCK_REALTIME may be used, since the protocol is | |||
able to survive the clock being occasionally stepped. | able to survive the clock being occasionally stepped. | |||
4. RTT-Based Route Selection | 4. RTT-Based Route Selection | |||
The protocol described above yields a series of RTT samples. While | The protocol described above yields a series of RTT samples. While | |||
these samples are fairly accurate, they are not directly usable as an | these samples are fairly accurate, they are not directly usable as an | |||
input to the route-selection procedure, for at least three reasons: | input to the route selection procedure, for at least three reasons: | |||
1. In the presence of bursty traffic, routers experience transient | 1. In the presence of bursty traffic, routers experience transient | |||
congestion, which causes occasional spikes in the measured RTT. | congestion, which causes occasional spikes in the measured RTT. | |||
Thus, the RTT signal may be noisy and require smoothing before it | Thus, the RTT signal may be noisy and require smoothing before it | |||
can be used for route selection. | can be used for route selection. | |||
2. Using the RTT signal for route selection gives rise to a negative | 2. Using the RTT signal for route selection gives rise to a negative | |||
feedback loop. When a route has a low RTT, it is deemed to be | feedback loop: when a route has a low RTT, it is deemed to be | |||
more desirable: this causes it to be used for more data traffic, | more desirable: this causes it to be used for more data traffic, | |||
which may lead to congestion, which in turn increases the RTT. | which may lead to congestion, which in turn increases the RTT. | |||
Without some form of hysteresis, using RTT for route selection | Without some form of hysteresis, using RTT for route selection | |||
would lead to oscillations between parallel routes, which would | would lead to oscillations between parallel routes, which would | |||
lead to packet reordering and negatively affect upper-layer | lead to packet reordering and negatively affect upper-layer | |||
protocols (such as TCP). | protocols (such as TCP). | |||
3. Even in the absence of congestion, the RTT tends to exhibit some | 3. Even in the absence of congestion, the RTT tends to exhibit some | |||
variation. If the RTTs of two parallel routes oscillate around a | variation. If the RTTs of two parallel routes oscillate around a | |||
common value, using the RTT as input to route selection will | common value, using the RTT as input to route selection will | |||
cause frequent routing oscillations, which, again, indicates the | cause frequent routing oscillations, which, again, indicates the | |||
need for some form of hysteresis. | need for some form of hysteresis. | |||
In this section, we describe an algorithm that integrates smoothing | In this section, we describe an algorithm that integrates smoothing | |||
and hysteresis. It has also been shown to behave well both in | and hysteresis. It has been shown to behave well both in simulation | |||
simulation and experimentally over the Internet [DELAY-BASED] and is | and experimentally over the Internet [DELAY-BASED] and is RECOMMENDED | |||
RECOMMENDED when RTT information is being used for route selection. | when RTT information is being used for route selection. The | |||
The algorithm is structured as follows: | algorithm is structured as follows: | |||
* the RTT values are first smoothed in order to avoid instabilities | * the RTT values are first smoothed in order to avoid instabilities | |||
due to outliers (Section 4.1); | due to outliers (Section 4.1); | |||
* the resulting smoothed samples are mapped to a cost using a | * the resulting smoothed samples are mapped to a cost using a | |||
bounded, non-linear mapping, which avoids instabilities at the | bounded, non-linear mapping, which avoids instabilities at the | |||
lower and upper end of the RTT range (Section 4.2); | lower and upper end of the RTT range (Section 4.2); | |||
* a hysteresis filter is applied in order to limit the amount of | * a hysteresis filter is applied in order to limit the amount of | |||
oscillation in the middle of the RTT range (Section 4.3). | oscillation in the middle of the RTT range (Section 4.3). | |||
skipping to change at line 406 ¶ | skipping to change at line 406 ¶ | |||
0.836 is the RECOMMENDED default. | 0.836 is the RECOMMENDED default. | |||
4.2. Cost Computation | 4.2. Cost Computation | |||
The smoothed RTT value obtained in the previous step needs to be | The smoothed RTT value obtained in the previous step needs to be | |||
mapped to a link cost, suitable for input to the metric computation | mapped to a link cost, suitable for input to the metric computation | |||
procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping | procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping | |||
should be monotonic (larger RTTs imply larger costs). In addition, | should be monotonic (larger RTTs imply larger costs). In addition, | |||
the mapping should be constant beyond a certain value (all very bad | the mapping should be constant beyond a certain value (all very bad | |||
links are equally bad) so that congested links do not contribute to | links are equally bad) so that congested links do not contribute to | |||
routing instability. The mapping should also be constant around 0; | routing instability. The mapping should also be constant around 0, | |||
this is so small oscillations in the RTT of low-RTT links do not | so that small oscillations in the RTT of low-RTT links do not | |||
contribute to routing instability. | contribute to routing instability. | |||
cost | cost | |||
^ | ^ | |||
| | | | |||
| | | | |||
| C + max-rtt-penalty | | C + max-rtt-penalty | |||
| +--------------------------- | | +--------------------------- | |||
| /. | | /. | |||
| / . | | / . | |||
skipping to change at line 474 ¶ | skipping to change at line 474 ¶ | |||
[RFC8966], implementations that do not understand this extension will | [RFC8966], implementations that do not understand this extension will | |||
silently ignore the sub-TLVs while parsing the rest of the TLVs that | silently ignore the sub-TLVs while parsing the rest of the TLVs that | |||
they contain. In effect, this extension supports building hybrid | they contain. In effect, this extension supports building hybrid | |||
networks consisting of extended and unextended routers; while such | networks consisting of extended and unextended routers; while such | |||
networks might suffer from sub-optimal routing, they will not suffer | networks might suffer from sub-optimal routing, they will not suffer | |||
from blackholes or routing loops. | from blackholes or routing loops. | |||
If a sub-TLV defined in this extension is longer than expected, the | If a sub-TLV defined in this extension is longer than expected, the | |||
additional data is silently ignored. This provision is made in order | additional data is silently ignored. This provision is made in order | |||
to allow a future version of this protocol to extend the packet | to allow a future version of this protocol to extend the packet | |||
format with additional data, for example, high-precision or absolute | format with additional data, for example high-precision or absolute | |||
timestamps. | timestamps. | |||
6. Packet Format | 6. Packet Format | |||
This extension defines the Timestamp sub-TLV whose Type field has a | This extension defines the Timestamp sub-TLV whose Type field has the | |||
value of 3. This sub-TLV can be contained within a Hello sub-TLV, in | value 3. This sub-TLV can be contained within a Hello sub-TLV, in | |||
which case it carries a single timestamp, or within an IHU sub-TLV, | which case it carries a single timestamp, or within an IHU sub-TLV, | |||
in which case it carries two timestamps. | in which case it carries two timestamps. | |||
Timestamps are encoded as 32-bit unsigned integers (modulo 2^32), | Timestamps are encoded as 32-bit unsigned integers (modulo 2^32), | |||
expressed in units of one microsecond, counting from an arbitrary | expressed in units of one microsecond, counting from an arbitrary | |||
origin. Timestamps wrap around every 4295 seconds, or roughly 71 | origin. Timestamps wrap around every 4295 seconds, or roughly 71 | |||
minutes (see also Section 3.3). | minutes (see also Section 3.3). | |||
6.1. Timestamp Sub-TLV in Hello TLVs | 6.1. Timestamp Sub-TLV in Hello TLVs | |||
End of changes. 18 change blocks. | ||||
39 lines changed or deleted | 39 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |