rfc9722.original | rfc9722.txt | |||
---|---|---|---|---|
BESS Working Group P. Brissette | Internet Engineering Task Force (IETF) P. Brissette | |||
Internet-Draft A. Sajassi | Request for Comments: 9722 A. Sajassi | |||
Updates: 8584 (if approved) LA. Burdet, Ed. | Updates: 8584 LA. Burdet, Ed. | |||
Intended status: Standards Track Cisco | Category: Standards Track Cisco | |||
Expires: 24 May 2025 J. Drake | ISSN: 2070-1721 J. Drake | |||
Independent | Independent | |||
J. Rabadan | J. Rabadan | |||
Nokia | Nokia | |||
20 November 2024 | April 2025 | |||
Fast Recovery for EVPN Designated Forwarder Election | Fast Recovery for EVPN Designated Forwarder Election | |||
draft-ietf-bess-evpn-fast-df-recovery-12 | ||||
Abstract | Abstract | |||
The Ethernet Virtual Private Network (EVPN) solution in RFC 7432 | The Ethernet Virtual Private Network (EVPN) solution in RFC 7432 | |||
provides Designated Forwarder (DF) election procedures for multihomed | provides Designated Forwarder (DF) election procedures for multihomed | |||
Ethernet Segments. These procedures have been enhanced further by | Ethernet Segments. These procedures have been enhanced further by | |||
applying the Highest Random Weight (HRW) algorithm for Designated | applying the Highest Random Weight (HRW) algorithm for DF election to | |||
Forwarder election to avoid unnecessary DF status changes upon a | avoid unnecessary DF status changes upon a failure. This document | |||
failure. This document improves these procedures by providing a fast | improves these procedures by providing a fast DF election upon | |||
Designated Forwarder election upon recovery of the failed link or | recovery of the failed link or node associated with the multihomed | |||
node associated with the multihomed Ethernet Segment. This document | Ethernet Segment. This document updates RFC 8584 by optionally | |||
updates RFC 8584 by optionally introducing delays between some of the | introducing delays between some of the events therein. | |||
events therein. | ||||
The solution is independent of the number of EVPN Instances (EVIs) | The solution is independent of the number of EVPN Instances (EVIs) | |||
associated with that Ethernet Segment and it is performed via a | associated with that Ethernet Segment, and it is performed via a | |||
simple signaling in BGP between the recovered node and each of the | simple signaling in BGP between the recovered node and each of the | |||
other nodes in the multihoming group. | other nodes in the multihoming group. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 24 May 2025. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9722. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language | |||
1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.2. Terminology | |||
1.3. Challenges with Existing Mechanism . . . . . . . . . . . 4 | 1.3. Challenges with Existing Mechanism | |||
1.4. Design Principles for a Solution . . . . . . . . . . . . 5 | 1.4. Design Principles for a Solution | |||
2. DF Election Synchronization Solution . . . . . . . . . . . . 6 | 2. DF Election Synchronization Solution | |||
2.1. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 7 | 2.1. BGP Encoding | |||
2.2. Timestamp Verification . . . . . . . . . . . . . . . . . 9 | 2.2. Timestamp Verification | |||
2.3. Updates to RFC8584 . . . . . . . . . . . . . . . . . . . 9 | 2.3. Updates to RFC 8584 | |||
3. Synchronization Scenarios . . . . . . . . . . . . . . . . . . 10 | 3. Synchronization Scenarios | |||
3.1. Concurrent Recoveries . . . . . . . . . . . . . . . . . . 12 | 3.1. Concurrent Recoveries | |||
4. Backwards Compatibility . . . . . . . . . . . . . . . . . . . 13 | 4. Backwards Compatibility | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | 5. Security Considerations | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | 6. IANA Considerations | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 7. References | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . 14 | 7.1. Normative References | |||
7.2. Informative References . . . . . . . . . . . . . . . . . 15 | 7.2. Informative References | |||
Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 15 | Acknowledgements | |||
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 16 | Contributors | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
The Ethernet Virtual Private Network (EVPN) solution [RFC7432] is | The Ethernet Virtual Private Network (EVPN) solution [RFC7432] is | |||
widely used in data center (DC) applications for Network | widely used in data center (DC) applications for Network | |||
Virtualization Overlay (NVO) and DC interconnect (DCI) services, and | Virtualization Overlay (NVO) and Data Center Interconnect (DCI) | |||
in service provider (SP) applications for next generation virtual | services and in service provider (SP) applications for next- | |||
private LAN services. | generation virtual private LAN services. | |||
[RFC7432] describes Designated Forwarder (DF) election procedures for | [RFC7432] describes Designated Forwarder (DF) election procedures for | |||
multihomed Ethernet Segments. These procedures are enhanced further | multihomed Ethernet Segments. These procedures are enhanced further | |||
in [RFC8584] by applying the Highest Random Weight algorithm for DF | in [RFC8584] by applying the Highest Random Weight (HRW) algorithm | |||
election in order to avoid unnecessary DF status changes upon a link | for DF election in order to avoid unnecessary DF status changes upon | |||
or node failure associated with the multihomed Ethernet Segment. | a link or node failure associated with the multihomed Ethernet | |||
Segment. | ||||
This document makes further improvements to the DF election | This document makes further improvements to the DF election | |||
procedures in [RFC8584] by providing an option for a fast DF election | procedures in [RFC8584] by providing an option for a fast DF election | |||
upon recovery of the failed link or node associated with the | upon recovery of the failed link or node associated with the | |||
multihomed Ethernet Segment. This DF election is achieved | multihomed Ethernet Segment. This DF election is achieved | |||
independent of the number of EVPN Instances (EVIs) associated with | independent of the number of EVPN Instances (EVIs) associated with | |||
that Ethernet Segment and it is performed via straightforward | that Ethernet Segment, and it is performed via straightforward | |||
signaling in BGP between the recovered node and each of the other | signaling in BGP between the recovered node and each of the other | |||
nodes in the multihomed Ethernet Segment redundancy group. | nodes in the multihomed Ethernet Segment redundancy group. | |||
This document updates the DF Election Finite State Machine (FSM) | This document updates the DF Election Finite State Machine (FSM) | |||
described in Section 2.1 of [RFC8584], by optionally introducing | described in Section 2.1 of [RFC8584] by optionally introducing | |||
delays between some events, as further detailed in Section 2.3. The | delays between some events, as further detailed in Section 2.3. The | |||
solution is based on a simple one-way signaling mechanism. | solution is based on a simple one-way signaling mechanism. | |||
1.1. Requirements Language | 1.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
1.2. Terminology | 1.2. Terminology | |||
PE: Provider Edge device. | PE: Provider Edge | |||
Designated Forwarder (DF): A PE that is currently forwarding | DF: Designated Forwarder. A PE that is currently forwarding | |||
(encapsulating/decapsulating) traffic for a given VLAN in and out | (encapsulating/decapsulating) traffic for a given VLAN in and out | |||
of a site. | of a site. | |||
NDF: Non-Designated Forwarder, a PE that is currently blocking | NDF: Non-Designated Forwarder. A PE that is currently blocking | |||
traffic (see DF above). | traffic (see DF above). | |||
EVI: An EVPN instance spanning the Provider Edge (PE) devices | EVI: EVPN Instance. It spans the PE devices participating in that | |||
participating in that EVPN. | EVPN. | |||
HRW: Highest Random Weight algorithm, [HRW98] | HRW: Highest Random Weight algorithm [HRW98] | |||
Service carving: DF Election is also referred to as "service | Service carving: DF Election is also referred to as "service | |||
carving" in [RFC7432] | carving" in [RFC7432] | |||
SCT: Service Carving Time, defined in this document, the time at | SCT: Service Carving Time. Defined in this document as the time at | |||
which all nodes participating in an Ethernet Segment perform DF | which all nodes participating in an Ethernet Segment perform DF | |||
Election. | Election. | |||
1.3. Challenges with Existing Mechanism | 1.3. Challenges with Existing Mechanism | |||
In EVPN technology, multiple Provider Edge (PE) devices encapsulate | In EVPN technology, multiple PE devices encapsulate and decapsulate | |||
and decapsulate data belonging to the same VLAN. Under certain | data belonging to the same VLAN. Under certain conditions, this may | |||
conditions, this may cause duplicated Ethernet packets and potential | cause duplicated Ethernet packets and potential loops if there is a | |||
loops if there is a momentary overlap in forwarding roles between two | momentary overlap in forwarding roles between two or more PE devices, | |||
or more PE devices, potentially also leading to broadcast storms of | potentially also leading to broadcast storms of frames forwarded back | |||
frames forwarded back into the VLAN. | into the VLAN. | |||
EVPN [RFC7432] currently specifies timer-based synchronization among | EVPN [RFC7432] currently specifies timer-based synchronization among | |||
PE devices within an Ethernet Segment redundancy group. This | PE devices within an Ethernet Segment redundancy group. This | |||
approach can lead to duplications and potential loops due to multiple | approach can lead to duplications and potential loops due to multiple | |||
Designated Forwarders (DFs) if the timer interval is too short, or to | DFs if the timer interval is too short or can lead to packet drops if | |||
packet drops if the timer interval is too long. | the timer interval is too long. | |||
Split-horizon filtering, as described in Section 8.3 of [RFC7432], | Split-horizon filtering, as described in Section 8.3 of [RFC7432], | |||
can prevent loops but does not address duplicates. However, if there | can prevent loops but does not address duplicates. However, if there | |||
are overlapping Designated Forwarders of two different sites | are overlapping DFs of two different sites simultaneously for the | |||
simultaneously for the same VLAN, the site identifier will differ | same VLAN, the site identifier will differ when the packet re-enters | |||
when the packet re-enters the Ethernet Segment. Consequently, the | the Ethernet Segment. Consequently, the split-horizon check will | |||
split-horizon check will fail, resulting in layer-2 loops. | fail, resulting in Layer 2 loops. | |||
The updated DF procedures outlined in [RFC8584] use the well-known | The updated DF procedures outlined in [RFC8584] use the well-known | |||
Highest Random Weight (HRW) algorithm to prevent the reshuffling of | HRW algorithm to prevent the reshuffling of VLANs among PE devices | |||
VLANs among PE devices within the Ethernet Segment redundancy group | within the Ethernet Segment redundancy group during failure or | |||
during failure or recovery events. This approach minimizes the | recovery events. This approach minimizes the impact on VLANs not | |||
impact on VLANs not assigned to the failed or recovered ports and | assigned to the failed or recovered ports and eliminates the | |||
eliminates the occurrence of loops or duplicates during such events. | occurrence of loops or duplicates during such events. | |||
However, upon PE insertion or a port being newly added to a | However, upon PE insertion or a port being newly added to a | |||
multihomed Ethernet Segment, HRW cannot help either as a transfer of | multihomed Ethernet Segment, the HRW cannot help either, as a | |||
DF role to the new port must occur while the old DF is still active. | transfer of the DF role to the new port must occur while the old DF | |||
is still active. | ||||
+---------+ | +---------+ | |||
+-------------+ | | | +-------------+ | | | |||
| | | | | | | | | | |||
/ | PE1 |----| | +-------------+ | / | PE1 |----| | +-------------+ | |||
/ | | | MPLS/ | | |---CE3 | / | | | MPLS/ | | |---CE3 | |||
/ +-------------+ | VxLAN/ | | PE3 | | / +-------------+ | VxLAN/ | | PE3 | | |||
CE1 - | Cloud | | | | CE1 - | Cloud | | | | |||
\ +-------------+ | |---| | | \ +-------------+ | |---| | | |||
\ | | | | +-------------+ | \ | | | | +-------------+ | |||
\ | PE2 |----| | | \ | PE2 |----| | | |||
| | | | | | | | | | |||
+-------------+ | | | +-------------+ | | | |||
+---------+ | +---------+ | |||
Figure 1: CE1 multihomed to PE1 and PE2. | Figure 1: CE1 Multihomed to PE1 and PE2 | |||
In Figure 1, when PE2 is inserted in the Ethernet Segment or its | In Figure 1, when PE2 is inserted in the Ethernet Segment or its | |||
CE1-facing interface recovered, PE1 will transfer the DF role of some | CE1-facing interface is recovered, PE1 will transfer the DF role of | |||
VLANs to PE2 to achieve load balancing. However, because there is no | some VLANs to PE2 to achieve load-balancing. However, because there | |||
handshake mechanism between PE1 and PE2, overlapping of DF roles for | is no handshake mechanism between PE1 and PE2, overlapping of DF | |||
a given VLAN is possible which leads to duplication of traffic as | roles for a given VLAN is possible, which leads to duplication of | |||
well as layer-2 loops. | traffic as well as Layer 2 loops. | |||
Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | |||
based approach for transferring the DF role to the newly inserted | based approach for transferring the DF role to the newly inserted | |||
device. This can cause the following issues: | device. This can cause the following issues: | |||
* Loops/Duplicates if the timer value is too short | * Loops and duplicates, if the timer value is too short | |||
* Prolonged Traffic Blackholing if the timer value is too long | * Prolonged traffic blackholing, if the timer value is too long | |||
1.4. Design Principles for a Solution | 1.4. Design Principles for a Solution | |||
The clock-synchronization solution for fast DF recovery presented in | The clock-synchronization solution for fast DF recovery presented in | |||
this document follows several design principles and offers multiple | this document follows several design principles and offers multiple | |||
advantages, namely: | advantages, namely: | |||
* Complex handshake signaling mechanisms and state machines are | * Complex handshake signaling mechanisms and state machines are | |||
avoided in favor of a simple uni-directional signaling approach. | avoided in favor of a simple unidirectional signaling approach. | |||
* The fast DF recovery solution maintains backwards compatibility | * The fast DF recovery solution maintains backwards compatibility | |||
(see Section 4) by ensuring that PEs reject any unrecognized new | (see Section 4) by ensuring that PEs reject any unrecognized new | |||
BGP EVPN Extended Community. | BGP EVPN Extended Community. | |||
* Existing DF Election algorithms remain supported. | * Existing DF Election algorithms remain supported. | |||
* The fast DF recovery solution is independent of any BGP delays in | * The fast DF recovery solution is independent of any BGP delays in | |||
propagation of Ethernet Segment routes (Route Type 4) | propagation of Ethernet Segment routes (Route Type 4) | |||
* The fast DF recovery solution is agnostic of the actual time | * The fast DF recovery solution is agnostic of the actual time | |||
synchronization mechanism used; however, an NTP-based | synchronization mechanism used; however, an NTP-based | |||
representation of time is used for EVPN signaling. | representation of time is used for EVPN signaling. | |||
The solution in this document relies on nodes in the topology, more | The solution in this document relies on nodes in the topology, more | |||
specifically the peering nodes of each Ethernet-Segment, to be clock- | specifically the peering nodes of each Ethernet-Segment, to be clock- | |||
synchronized and advertise Time Synchronization capability. When | synchronized and to advertise Time Synchronization capability. When | |||
this is not the case, or clocks are badly desynchronized, network | this is not the case, or when clocks are badly desynchronized, | |||
convergence and DF Election is no worse than [RFC7432] due to the | network convergence and DF Election is no worse than that described | |||
timestamp range checking (Section 2.2). | in [RFC7432] due to the timestamp range checking (Section 2.2). | |||
2. DF Election Synchronization Solution | 2. DF Election Synchronization Solution | |||
The fast DF recovery solution relies on the concept of common clock | The fast DF recovery solution relies on the concept of common clock | |||
alignment between partner PEs participating in a common Ethernet | alignment between partner PEs participating in a common Ethernet | |||
Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | |||
peering PEs of that Ethernet Segment perform DF election and apply | peering PEs of that Ethernet Segment perform DF election and apply | |||
the result at the same previously-announced time. | the result at the same previously announced time. | |||
The DF Election procedure, as described in [RFC7432] and as | The DF Election procedure, as described in [RFC7432] and as | |||
optionally signaled in [RFC8584], is applied. All PEs attached to a | optionally signaled in [RFC8584], is applied. All PEs attached to a | |||
given Ethernet Segment are clock-synchronized using a networking | given Ethernet Segment are clock-synchronized using a networking | |||
protocol for clock synchronization (e.g., NTP, PTP). Whenever | protocol for clock synchronization (e.g., NTP, Precision Time | |||
possible, recovery activities for failed PEs SHOULD NOT be initiated | Protocol (PTP)). Whenever possible, recovery activities for failed | |||
until after the underlying clock synchronization protocol has | PEs SHOULD NOT be initiated until after the underlying clock | |||
converged to benefit from this document's fast DF recovery | synchronization protocol has converged to benefit from this | |||
procedures. When a new PE is inserted in an Ethernet Segment or a | document's fast DF recovery procedures. When a new PE is inserted in | |||
failed PE of the Ethernet Segment recovers, that PE communicates to | an Ethernet Segment or when a failed PE of the Ethernet Segment | |||
peering partners the current time plus the value of the timer for | recovers, that PE communicates to peering partners the current time | |||
partner discovery from step 2 in Section 8.5 of [RFC7432]. This | plus the value of the timer for partner discovery from step 2 in | |||
constitutes an "end time" or "absolute time" as seen from the local | Section 8.5 of [RFC7432]. This constitutes an "end time" or | |||
PE. That absolute time is called the "Service Carving Time" (SCT). | "absolute time" as seen from the local PE. That absolute time is | |||
called the Service Carving Time (SCT). | ||||
A new BGP EVPN Extended Community, the Service Carving Time is | A new BGP EVPN Extended Community, the Service Carving Time, is | |||
advertised along with the Ethernet Segment Route Type 4 (RT-4) and | advertised along with the Ethernet Segment Route Type 4 (RT-4) and | |||
communicates the Service Carving Time to other partners to ensure an | communicates the SCT to other partners to ensure an orderly transfer | |||
orderly transfer of forwarding duties. | of forwarding duties. | |||
Upon receipt of the new BGP EVPN Extended Community, partner PEs can | Upon receipt of the new BGP EVPN Extended Community, partner PEs can | |||
determine the service carving time of the newly inserted PE. To | determine the SCT of the newly inserted PE. To eliminate any | |||
eliminate any potential for duplicate traffic or loops, the concept | potential for duplicate traffic or loops, the concept of "skew" is | |||
of skew is introduced: a small time offset to ensure a controlled and | introduced: a small time offset to ensure a controlled and orderly | |||
orderly transition when multiple Provider Edge (PE) devices are | transition when multiple PE devices are involved. The previously | |||
involved. The previously inserted PE(s) must perform service carving | inserted PE(s) must perform service carving first for NDF to DF | |||
first for NDF to DF transitions. The receiving PEs subtract this | transitions. The receiving PEs subtract this skew (default = 10 ms) | |||
skew (default = 10ms) to the Service Carving Time and apply NDF to DF | to the Service Carving Time and apply NDF to DF transitions first. | |||
transitions first. This is followed shortly by the NDF to DF | This is followed shortly by the NDF to DF transitions on both PEs, | |||
transitions on both PEs, after the skew delay. On the recovering PE, | after the skew delay. On the recovering PE, all services are already | |||
all services are already in NDF state and no skew for DF to NDF | in NDF state, and no skew for DF to NDF transitions is required. | |||
transitions is required. | ||||
This document proposes a default skew value of 10ms to allow | This document proposes a default skew value of 10 ms to allow | |||
completion of programming the DF to NDF transitions, but | completion of programming the DF to NDF transitions, but | |||
implementations may make the skew larger (or configurable) taking | implementations may make the skew larger (or configurable) taking | |||
into consideration scale, hardware capabilities and clock accuracy. | into consideration scale, hardware capabilities, and clock accuracy. | |||
To summarize, all peering PEs perform service carving almost | To summarize, all peering PEs perform service carving almost | |||
simultaneously at the time announced by the newly added/recovered PE. | simultaneously at the time announced by the newly added/recovered PE. | |||
The newly inserted PE initiates the SCT, and triggers service carving | The newly inserted PE initiates the SCT and triggers service carving | |||
immediately on its local timer expiry. The previously inserted PE(s) | immediately on its local timer expiry. The previously inserted PE(s) | |||
receiving Ethernet Segment route (RT-4) with an SCT BGP extended | receiving Ethernet Segment route (RT-4) with an SCT BGP extended | |||
community, perform service carving shortly before Service Carving | community perform service carving shortly before the SCT for DF to | |||
Time for DF to NDF transitions, and at Service Carving Time for NDF | NDF transitions and at the SCT for NDF to DF transitions. | |||
to DF transitions. | ||||
2.1. BGP Encoding | 2.1. BGP Encoding | |||
A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | |||
defined to communicate the Service Carving Time for each Ethernet | defined to communicate the SCT for each Ethernet Segment: | |||
Segment: | ||||
1 2 3 | 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
~ Timestamp Seconds | Timestamp Fractional Seconds | | ~ Timestamp Seconds | Timestamp Fractional Seconds | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 2: Service Carving Time | Figure 2: Service Carving Time | |||
skipping to change at page 7, line 41 ¶ | skipping to change at line 307 ¶ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
~ Timestamp Seconds | Timestamp Fractional Seconds | | ~ Timestamp Seconds | Timestamp Fractional Seconds | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 2: Service Carving Time | Figure 2: Service Carving Time | |||
The timestamp exchanged uses the NTP prime epoch of January 1, 1900 | The timestamp exchanged uses the NTP prime epoch of January 1, 1900 | |||
[RFC5905] and an adapted form of the 64-bit NTP Timestamp Format. | [RFC5905] and an adapted form of the 64-bit NTP Timestamp Format. | |||
The 64-bit NTP Timestamp Format consists of a 32-bit part for Seconds | The 64-bit NTP Timestamp Format consists of a 32-bit part for Seconds | |||
and a 32-bit part for Fraction, which are encoded in the Service | and a 32-bit part for Fractional Seconds, which are encoded in the | |||
Carving Time as follows: | Service Carving Time as follows: | |||
* Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | |||
* Timestamp Fractional Seconds: the high order 16 bits of the NTP | Timestamp Fractional Seconds: The high-order 16 bits of the NTP | |||
'Fraction' field are encoded in this field. | "Fraction" field are encoded in this field. | |||
When rebuilding a 64-bit NTP Timestamp Format using the values from a | When rebuilding a 64-bit NTP Timestamp Format using the values from a | |||
received SCT BGP extended community, the lower order 16 bits of the | received SCT BGP extended community, the lower-order 16 bits of the | |||
Fractional field are set to 0. The use of a 16-bit fractional | Fractional field are set to 0. The use of a 16-bit fractional | |||
seconds value yields adequate precision of 15 microseconds (2^-16 s). | seconds value yields adequate precision of 15 microseconds (2^-16 s). | |||
This document introduces a new flag called Time Synchronization | This document introduces a new flag called Time Synchronization | |||
indicated by "T" in the DF Election Capabilities registry defined in | indicated by "T" in the "DF Election Capabilities" registry defined | |||
[RFC8584] for use in DF Election Extended Community. | in [RFC8584] for use in DF Election Extended Community. | |||
1 2 3 | 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
~ Bitmap | Reserved | | ~ Bitmap | Reserved | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 4: DF Election Extended Community | Figure 3: DF Election Extended Community (Figure 4 in RFC 8584) | |||
Figure 3: DF Election Extended Community | ||||
1 1 | 1 1 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |A| |T| | | | |A| |T| | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 5: DF Election Capabilities | ||||
Figure 4: DF Election Capabilities | Figure 4: DF Election Capabilities | |||
* Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | |||
Election Extended Community). When set to 1, it indicates the | Election Extended Community). When set to 1, it indicates the | |||
desire to use Time Synchronization capability with the rest of the | desire to use Time Synchronization capability with the rest of the | |||
PEs in the Ethernet Segment. | PEs in the Ethernet Segment. | |||
This capability is utilized in conjunction with the agreed-upon DF | This capability is utilized in conjunction with the agreed-upon DF | |||
Election Type. For instance, if all the PE devices in the Ethernet | Election Type. For instance, if all the PE devices in the Ethernet | |||
Segment indicate the desire to use the Time Synchronization | Segment indicate the desire to use the Time Synchronization | |||
capability and request the DF Election Type to be Highest Random | capability and request the DF Election Type to be the HRW, then the | |||
Weight (HRW), then the HRW algorithm is used in conjunction with this | HRW algorithm is used in conjunction with this capability. A PE that | |||
capability. A PE which does not support the procedures set out in | does not support the procedures set out in this document or that | |||
this document, or receives a route from another PE in which the | receives a route from another PE in which the capability is not set | |||
capability is not set, MUST NOT delay Designated Forwarder election | MUST NOT delay DF election as this could lead to duplicate traffic in | |||
as this could lead to duplicate traffic in some instances | some instances (overlapping DFs). | |||
(overlapping Designated Forwarders). | ||||
2.2. Timestamp Verification | 2.2. Timestamp Verification | |||
The NTP Era value is not exchanged and participating PEs may consider | The NTP Era value is not exchanged, and participating PEs may | |||
the timestamps to be in the same Era as their local value. A DF | consider the timestamps to be in the same Era as their local value. | |||
Election operation occurring exactly at the next Era transition will | A DF Election operation occurring exactly at the next Era transition | |||
be sometime on February 7, 2036. Implementors and operators may | will be some time on February 7, 2036. Implementors and operators | |||
address credible cases of rollover ambiguity (adjacent Eras n and | may address credible cases of rollover ambiguity (adjacent Eras n and | |||
n+1), as well as the security issue of unreasonably large or | n+1) as well as the security issue of unreasonably large or | |||
unreasonably small NTP timestamps, in the following manner. | unreasonably small NTP timestamps in the following manner. | |||
The procedures in this document address implicitly what occurs with | The procedures in this document address implicitly what occurs with | |||
receiving a SCT value in the past. This would be a naturally | receiving an SCT value in the past. This would be a naturally | |||
occurring event with a large BGP propagation delay: the receiving PE | occurring event with a large BGP propagation delay: the receiving PE | |||
treats the DF Election at the peer as having occurred already and | treats the DF Election at the peer as having already occurred and | |||
proceeds without starting any timer to further delay service carving, | proceeds without starting any timer to further delay service carving, | |||
effectively falling back on [RFC7432] behavior. A PE which receives | effectively falling back on behavior as specified in [RFC7432]. A PE | |||
a SCT value smaller than its current time, MUST discard the Service | that receives an SCT value smaller than its current time MUST discard | |||
Carving Time and SHALL treat the DF Election at the peer as having | the Service Carving Time and SHALL treat the DF Election at the peer | |||
occurred already. | as having occurred already. | |||
The more problematic scenario is the PE in Era n+1 that receives an | ||||
SCT advertised by the PE still in Era n, with a very large SCT value. | ||||
To address this Era rollover as well as the large values attack | ||||
vector, implementations MUST validate the received SCT against an | ||||
upper bound. | ||||
The more problematic scenario is the PE in Era n+1 which receives a | ||||
Service Carving Time advertised by the PE still in Era n, with a very | ||||
large SCT value. To address this Era rollover as well as the large | ||||
values attack vector, implementations MUST validate the received SCT | ||||
against an upper-bound. | ||||
It is left to implementations to decide what constitutes an | It is left to implementations to decide what constitutes an | |||
"unreasonably large" SCT value. A recommended approach, however, is | "unreasonably large" SCT value. A recommended approach, however, is | |||
to compare the received offset to the local peering timer value. In | to compare the received offset to the local peering timer value. In | |||
practice, peering timer values are configured uniformly across | practice, peering timer values are configured uniformly across | |||
Ethernet-Segment peers and may be treated as an upper-bound on the | Ethernet Segment peers and may be treated as an upper bound on the | |||
offset of received SCT values. A PE which receives an SCT | offset of received SCT values. A PE that receives an SCT | |||
representing an offset larger than the local peering timer MUST | representing an offset larger than the local peering timer MUST | |||
discard the Service Carving Time and SHALL treat the DF Election at | discard the SCT and SHALL treat the DF Election at the peer as having | |||
the peer as having occurred already, as above. | already occurred, as above. | |||
2.3. Updates to RFC8584 | 2.3. Updates to RFC 8584 | |||
This document introduces an additional delay to the events and | This document introduces an additional delay to the events and | |||
transitions defined for the default DF election algorithm FSM in | transitions defined for the default DF election algorithm FSM in | |||
Section 2.1 of [RFC8584] without changing the FSM state or event | Section 2.1 of [RFC8584] without changing the FSM state or event | |||
definitions themselves. | definitions themselves. | |||
Upon receiving a RCVD_ES message, the peering PE's Finite State | Upon receiving an RCVD_ES message, the peering PE's FSM transitions | |||
Machine (FSM) transitions from the DF_DONE (indicating the DF | from the DF_DONE state (indicating the DF election process was | |||
election process was complete) state to the DF_CALC (indicating that | complete) to the DF_CALC state (indicating that a new DF calculation | |||
a new DF calculation is needed) state. Due to the Service Carving | is needed). Due to the SCT included in the Ethernet Segment update, | |||
Time (SCT) included in the Ethernet-Segment update, the completion of | the completion of the DF_CALC state and the subsequent transition | |||
the DF_CALC state and the subsequent transition back to the DF_DONE | back to the DF_DONE state are delayed. This delay ensures proper | |||
state are delayed. This delay ensures proper synchronization and | synchronization and prevents conflicts. Consequently, the | |||
prevents conflicts. Consequently, the accompanying forwarding | accompanying forwarding updates to the DF and NDF states are also | |||
updates to the Designated Forwarder (DF) and Non-Designated Forwarder | deferred. | |||
(NDF) states are also deferred. | ||||
Item 9. in Section 2.1 of [RFC8584], the list "Corresponding actions | ||||
when transitions are performed or states are entered/exited" is | ||||
changed as follows: | ||||
9. DF_CALC on CALCULATED: Mark the election result for the VLAN or | ||||
VLAN Bundle. | ||||
9.1 If an SCT timestamp is present during the RCVD_ES event of | ||||
Action 11, wait until the time indicated by the SCT minus | ||||
skew before proceeding to step 9.3. | ||||
9.2 If an SCT timestamp is present during the RCVD_ES event of | ||||
Action 11, wait until the time indicated by the SCT before | ||||
proceeding to step 9.4. | ||||
9.3 Assume the role of NDF for the local PE concerning the VLAN | Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding | |||
or VLAN Bundle, and transition to the DF_DONE state. | actions when transitions are performed or states are entered/exited", | |||
is changed as follows: | ||||
9.4 Assume the role of DF for the local PE concerning the VLAN | | 9. DF_CALC on CALCULATED: Mark the election result for the VLAN | |||
or VLAN Bundle, and transition to the DF_DONE state. | | or VLAN bundle. | |||
| | ||||
| 9.1 If an SCT timestamp is present during the RCVD_ES event | ||||
| of Action 11, wait until the time indicated by the SCT | ||||
| minus skew before proceeding to step 9.3. | ||||
| | ||||
| 9.2 If an SCT timestamp is present during the RCVD_ES event | ||||
| of Action 11, wait until the time indicated by the SCT | ||||
| before proceeding to step 9.4. | ||||
| | ||||
| 9.3 Assume the role of NDF for the local PE concerning the | ||||
| VLAN or VLAN bundle and transition to the DF_DONE state. | ||||
| | ||||
| 9.4 Assume the role of DF for the local PE concerning the | ||||
| VLAN or VLAN bundle and transition to the DF_DONE state. | ||||
This revised approach ensures proper timing and synchronization in | This revised approach ensures proper timing and synchronization in | |||
the DF election process, avoiding conflicts and ensuring accurate | the DF election process, avoiding conflicts and ensuring accurate | |||
forwarding updates. | forwarding updates. | |||
3. Synchronization Scenarios | 3. Synchronization Scenarios | |||
Consider Figure 1 as an example, where initially PE2 has failed and | Consider Figure 1 as an example, where initially PE2 has failed and | |||
PE1 has taken over. This scenario illustrates the problem with the | PE1 has taken over. This scenario illustrates the problem with the | |||
DF-Election mechanism described in Section 8.5 of [RFC7432], | DF Election mechanism described in Section 8.5 of [RFC7432], | |||
specifically in the context of the timer value configured for all PEs | specifically in the context of the timer value configured for all PEs | |||
on the Ethernet Segment. | on the Ethernet Segment. | |||
Procedure based on Section 8.5 of [RFC7432] with the default 3-second | The following procedure is based on Section 8.5 of [RFC7432] with the | |||
timer in step 2: | default 3-second timer in step 2. | |||
1. Initial state: PE1 is in a steady-state and PE2 is recovering. | 1. Initial state: PE1 is in a steady-state and PE2 is recovering. | |||
2. Recovery: PE2 recovers at an absolute time of t=99. | 2. Recovery: PE2 recovers at an absolute time of t=99. | |||
3. Advertisement: PE2 advertises RT-4, sent at t=100, to partner | 3. Advertisement: PE2 advertises RT-4, sent at t=100, to its partner | |||
PE1. | (PE1). | |||
4. Timer Start: PE2 starts a 3-second timer to allow the reception | 4. Timer Start: PE2 starts a 3-second timer to allow the reception | |||
of RT-4 from other PE nodes. | of RT-4 from other PE nodes. | |||
5. Immediate carving: PE1 performs service carving immediately upon | 5. Immediate carving: PE1 performs service carving immediately upon | |||
RT-4 reception, i.e., t=100 plus some BGP propagation delay. | RT-4 reception, i.e., t=100 plus some BGP propagation delay. | |||
6. Delayed Carving: PE2 performs service carving at time t=103. | 6. Delayed Carving: PE2 performs service carving at time t=103. | |||
[RFC7432] favors traffic drops over duplicate traffic. With the | [RFC7432] favors traffic drops over duplicate traffic. With the | |||
above procedure, traffic drops will occur as part of each PE recovery | above procedure, traffic drops will occur as part of each PE recovery | |||
sequence since PE1 transitions some VLANs to Non-Designated Forwarder | sequence since PE1 transitions some VLANs to an NDF immediately upon | |||
(NDF) immediately upon RT-4 reception. | RT-4 reception. The timer value (default = 3 seconds) directly | |||
The timer value (default = 3 seconds) directly affects the duration | affects the duration of the packet drops. A shorter (or zero) timer | |||
of the packet drops. A shorter (or zero) timer may result in | may result in duplicate traffic or traffic loops. | |||
duplicate traffic or traffic loops. | ||||
Procedure based on the Service Carving Time (SCT) approach: | The following procedure is based on the SCT approach: | |||
1. Initial state: PE1 is in a steady state, and PE2 is recovering. | 1. Initial state: PE1 is in a steady state, and PE2 is recovering. | |||
2. Recovery: PE2 recovers at an absolute time of t=99. | 2. Recovery: PE2 recovers at an absolute time of t=99. | |||
3. Timer Start: PE2 starts at t=100 a 3-second timer to allow the | 3. Timer Start: PE2 starts at t=100 a 3-second timer to allow the | |||
reception of RT-4 from other PE nodes. | reception of RT-4 from other PE nodes. | |||
4. Advertisement: PE2 advertises RT-4, sent at t=100, with a target | 4. Advertisement: PE2 advertises RT-4, sent at t=100, with a target | |||
SCT value of t=103 to partner PE1. | SCT value of t=103 to its partner (PE1). | |||
5. Service Carving Timer: PE1 starts the service carving timer, with | 5. Service Carving Timer: PE1 starts the service carving timer, with | |||
the remaining time until t=103. | the remaining time until t=103. | |||
6. Simultaneous Carving: Both PE1 and PE2 carve at an absolute time | 6. Simultaneous Carving: Both PE1 and PE2 carve at an absolute time | |||
of t=103. | of t=103. | |||
To maintain the preference for minimal loss over duplicate traffic, | To maintain the preference for minimal loss over duplicate traffic, | |||
PE1 SHOULD carve slightly before PE2 (with skew). The recovering PE2 | PE1 SHOULD carve slightly before PE2 (with skew). The recovering PE2 | |||
performs both DF to NDF and NDF to DF transitions per VLAN at the | performs both DF-to-NDF and NDF-to-DF transitions per VLAN at the | |||
timer's expiry. The original PE1, which received the SCT, applies | timer's expiry. The original PE1, which received the SCT, applies | |||
the following: | the following: | |||
* DF to NDF Transition(s): at t=SCT minus skew, where both PEs are | * DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are | |||
NDF for the skew duration. | NDF for the skew duration. | |||
* NDF to DF Transition(s): at t=SCT. | * NDF-to-DF Transition(s): at t=SCT. | |||
This split-behavior ensures a smooth DF role transition with minimal | This split behavior ensures a smooth DF role transition with minimal | |||
loss. | loss. | |||
Using the SCT approach, the negative effect of the timer to allow the | Using the SCT approach, the negative effect of the timer to allow the | |||
reception of Ethernet Segment RT-4 from other PE nodes is mitigated. | reception of Ethernet Segment (ES) RT-4 from other PE nodes is | |||
Furthermore, the BGP transmission delay (from PE2 to PE1) of the ES | mitigated. Furthermore, the BGP transmission delay (from PE2 to PE1) | |||
RT-4 becomes a non-issue. The SCT approach shortens the 3-second | of the ES RT-4 becomes a non-issue. The SCT approach shortens the | |||
timer window to the order of milliseconds. | 3-second timer window to the order of milliseconds. | |||
The peering timer is a configurable value where 3 seconds represents | The peering timer is a configurable value where 3 seconds represents | |||
the default. Configuring a timer value of 0, or so small as to | the default. Configuring a timer value of 0, or so small as to | |||
expire during propagation of the BGP routes, is outside the scope of | expire during propagation of the BGP routes, is outside the scope of | |||
this document. In reality, the use of the SCT approach presented in | this document. In reality, the use of the SCT approach presented in | |||
this document encourages the use of larger peering timer values to | this document encourages the use of larger peering timer values to | |||
overcome any sort of BGP route propagation delays. | overcome any sort of BGP route propagation delays. | |||
3.1. Concurrent Recoveries | 3.1. Concurrent Recoveries | |||
In the eventuality 2 or more PEs in a peering Ethernet Segment group | In the eventuality that two or more PEs in a peering Ethernet Segment | |||
are recovering concurrently or roughly the same time, each will | group are recovering concurrently or roughly at the same time, each | |||
advertise a Service Carving Time. This SCT value would correspond to | will advertise a SCT. This SCT value would correspond to what each | |||
what each recovering PE considers the "end time" for DF Election. A | recovering PE considers the "end time" for DF Election. A similar | |||
similar situation arises in sequentially recovering PEs, when a | situation arises in sequentially recovering PEs, when a second PE | |||
second PE recovers approximately at the time of the first PE's | recovers approximately at the time of the first PE's advertised SCT | |||
advertised SCT expiry, and with its own new SCT-2 outside of the | expiry and with its own new SCT-2 outside of the initial SCT window. | |||
initial SCT window. | ||||
In the case of multiple concurrent DF elections, each initiated by | In the case of multiple concurrent DF elections, each initiated by | |||
one of the recovering PEs, the SCTs must be ordered chronologically. | one of the recovering PEs, the SCTs must be ordered chronologically. | |||
All PEs SHALL execute only a single DF Election at the service | All PEs SHALL execute only a single DF Election at the service | |||
carving time corresponding to the largest (latest) received timestamp | carving time corresponding to the largest (latest) received timestamp | |||
value. This DF Election will lead peering PEs into a single co- | value. This DF Election will lead peering PEs into a single | |||
ordinated DF Election update. | coordinated DF Election update. | |||
Example: | Example: | |||
1. Initial State: PE1 is in a steady state, with services elected at | 1. Initial State: PE1 is in a steady state, with services elected at | |||
PE1. | PE1. | |||
2. Recovery of PE2: PE2 recovers at time t=100 and advertises RT-4 | 2. Recovery of PE2: PE2 recovers at time t=100 and advertises RT-4 | |||
with a target SCT value of t=103 to its partners (PE1). | with a target SCT value of t=103 to its partner (PE1). | |||
3. Timer Initiation by PE2: PE2 starts a 3-second timer to allow the | 3. Timer Initiation by PE2: PE2 starts a 3-second timer to allow the | |||
reception of RT-4 from other PE nodes. | reception of RT-4 from other PE nodes. | |||
4. Timer Initiation by PE1: PE1 starts the service carving timer, | 4. Timer Initiation by PE1: PE1 starts the service carving timer, | |||
with the remaining time until t=103. | with the remaining time until t=103. | |||
5. Recovery of PE3: PE3 recovers at time t=102 and advertises RT-4 | 5. Recovery of PE3: PE3 recovers at time t=102 and advertises RT-4 | |||
with a target SCT value of t=105 to its partners (PE1, PE2). | with a target SCT value of t=105 to its partners (PE1, PE2). | |||
skipping to change at page 13, line 17 ¶ | skipping to change at line 561 ¶ | |||
7. Timer Update by PE2: PE2 cancels the running timer and starts the | 7. Timer Update by PE2: PE2 cancels the running timer and starts the | |||
service carving timer with the remaining time until t=105. | service carving timer with the remaining time until t=105. | |||
8. Timer Update by PE1: PE1 updates its service carving timer, with | 8. Timer Update by PE1: PE1 updates its service carving timer, with | |||
the remaining time until t=105. | the remaining time until t=105. | |||
9. Service Carving: PE1, PE2, and PE3 perform service carving at the | 9. Service Carving: PE1, PE2, and PE3 perform service carving at the | |||
absolute time of t=105. | absolute time of t=105. | |||
In the eventuality a PE in an Ethernet Segment group recovers during | In the eventuality that a PE in an Ethernet Segment group recovers | |||
the discovery window specified in Section 8.5 of [RFC7432], and does | during the discovery window specified in Section 8.5 of [RFC7432] and | |||
not support or advertise the T-bit, then all PEs in the current | does not support or advertise the T-bit, all PEs in the current | |||
peering sequence SHALL immediately revert to the default [RFC7432] | peering sequence SHALL immediately revert to the default behavior | |||
behavior. | described in [RFC7432]. | |||
4. Backwards Compatibility | 4. Backwards Compatibility | |||
For the DF election procedures to achieve global convergence and | For the DF election procedures to achieve global convergence and | |||
unanimity within a redundancy group, it is essential that all | unanimity within a redundancy group, it is essential that all | |||
participating PEs agree on the DF election algorithm to be employed. | participating PEs agree on the DF election algorithm to be employed. | |||
However, it is possible that some PEs may continue to use the | However, it is possible that some PEs may continue to use the | |||
existing modulo-based DF election algorithm from [RFC7432] and not | existing modulo-based DF election algorithm from [RFC7432] and not | |||
utilize the new Service Carving Time (SCT) BGP extended community. | utilize the new SCT BGP extended community. PEs that operate using | |||
PEs that operate using the baseline DF election mechanism will simply | the baseline DF election mechanism will simply discard the new SCT | |||
discard the new SCT BGP extended community as unrecognized. | BGP extended community as unrecognized. | |||
A PE can indicate its willingness to support clock-synchronized | A PE can indicate its willingness to support clock-synchronized | |||
carving by signaling the new 'T' DF Election Capability and including | carving by signaling the new "T" DF Election Capability and including | |||
the new SCT BGP extended community along with the Ethernet Segment | the new SCT BGP extended community along with the Ethernet Segment | |||
Route Type 4. If one or more PEs attached to the Ethernet Segment do | Route Type 4. If one or more PEs attached to the Ethernet Segment do | |||
not signal T=1, then all PEs in the Ethernet Segment SHALL revert to | not signal T=1, then all PEs in the Ethernet Segment SHALL revert to | |||
the timer-based approach as specified in [RFC7432]. This reversion | the timer-based approach as specified in [RFC7432]. This reversion | |||
is particularly crucial in preventing VLAN shuffling when more than | is particularly crucial in preventing VLAN shuffling when more than | |||
two PEs are involved. | two PEs are involved. | |||
In the event a new or extra RT-4 is received without the new 'T' DF | In the event a new or extra RT-4 is received without the new "T" DF | |||
Election Capability in the midst of an ongoing DF Election sequence, | Election Capability in the midst of an ongoing DF Election sequence, | |||
all SCT-based delays are cancelled and the DF Election immediately | all SCT-based delays are canceled, and the DF Election is immediately | |||
applied as specified in [RFC7432], as if no SCT had been previously | applied as specified in [RFC7432], as if no SCT had been previously | |||
exchanged. | exchanged. | |||
5. Security Considerations | 5. Security Considerations | |||
The mechanisms in this document use the EVPN control plane as defined | The mechanisms in this document use the EVPN control plane as defined | |||
in [RFC7432]. Security considerations described in [RFC7432] are | in [RFC7432]. Security considerations described in [RFC7432] are | |||
equally applicable. | equally applicable. | |||
For the new SCT Extended Community, attack vectors may be setting the | For the new SCT Extended Community, attack vectors may be setting the | |||
value to zero, to a value in the past or to large times in the | value to zero, to a value in the past, or to large times in the | |||
future. Handling of this attack vector is addressed in Section 2.2 | future. Handling of this attack vector is addressed in Section 2.2 | |||
alongside NTP Era rollover ambiguity. | alongside NTP Era rollover ambiguity. | |||
This document uses MPLS and IP-based tunnel technologies to support | This document uses MPLS- and IP-based tunnel technologies to support | |||
data plane transport. Security considerations described in [RFC7432] | data plane transport. Security considerations described in [RFC7432] | |||
and in [RFC8365] are equally applicable. | and [RFC8365] are equally applicable. | |||
6. IANA Considerations | 6. IANA Considerations | |||
IANA maintains the "EVPN Extended Community Sub-Types" registry set | IANA has made the following assignment in the "EVPN Extended | |||
up by [RFC7153], where the following assignment has been made: | Community Sub-Types" registry set up by [RFC7153]. | |||
Sub-Type Value Name Reference | +================+======================+===========+ | |||
-------------- ------------------------- ------------- | | Sub-Type Value | Name | Reference | | |||
0x0F Service Carving Time This document | +================+======================+===========+ | |||
| 0x0F | Service Carving Time | RFC 9722 | | ||||
+----------------+----------------------+-----------+ | ||||
IANA maintains the "DF Election Capabilities" registry set up by | Table 1 | |||
[RFC8584]. IANA is requested to make the following assignment from | ||||
this registry: | ||||
Bit Name Reference | IANA has made the following assignment in the "DF Election | |||
---- ---------------- ------------- | Capabilities" registry set up by [RFC8584]. | |||
3 Time Synchronization This document | ||||
+=====+======================+===========+ | ||||
| Bit | Name | Reference | | ||||
+=====+======================+===========+ | ||||
| 3 | Time Synchronization | RFC 9722 | | ||||
+-----+----------------------+-----------+ | ||||
Table 2 | ||||
7. References | 7. References | |||
7.1. Normative References | 7.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
skipping to change at page 15, line 33 ¶ | skipping to change at line 674 ¶ | |||
[RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, | [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, | |||
J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet | J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet | |||
VPN Designated Forwarder Election Extensibility", | VPN Designated Forwarder Election Extensibility", | |||
RFC 8584, DOI 10.17487/RFC8584, April 2019, | RFC 8584, DOI 10.17487/RFC8584, April 2019, | |||
<https://www.rfc-editor.org/info/rfc8584>. | <https://www.rfc-editor.org/info/rfc8584>. | |||
7.2. Informative References | 7.2. Informative References | |||
[HRW98] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings | [HRW98] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings | |||
to Increase Hit Rates", 1998, | to Increase Hit Rates", IEEE/ACM Transactions on | |||
<https://www.microsoft.com/en-us/research/wp-content/ | Networking, vol. 6, no. 1, February 1998, | |||
uploads/2017/02/HRW98.pdf>. | <https://www.microsoft.com/en-us/research/wp- | |||
content/uploads/2017/02/HRW98.pdf>. | ||||
Appendix A. Contributors | Acknowledgements | |||
Authors would like to acknowledge helpful comments and contributions | ||||
of Satya Mohanty and Bharath Vasudevan. Also thank you to Anoop | ||||
Ghanwani and Gunter van de Velde for their thorough review with | ||||
valuable comments and corrections. | ||||
Contributors | ||||
In addition to the authors listed on the front page, the following | In addition to the authors listed on the front page, the following | |||
co-authors have also contributed substantially to this document: | coauthors have also contributed substantially to this document: | |||
Gaurav Badoni | Gaurav Badoni | |||
Cisco | Cisco | |||
Email: gbadoni@cisco.com | Email: gbadoni@cisco.com | |||
Dhananjaya Rao | Dhananjaya Rao | |||
Cisco | Cisco | |||
Email: dhrao@cisco.com | Email: dhrao@cisco.com | |||
Appendix B. Acknowledgements | ||||
Authors would like to acknowledge helpful comments and contributions | ||||
of Satya Mohanty and Bharath Vasudevan. Also thank you to Anoop | ||||
Ghanwani and Gunter van de Velde for their thorough review with | ||||
valuable comments and corrections. | ||||
Authors' Addresses | Authors' Addresses | |||
Patrice Brissette | Patrice Brissette | |||
Cisco | Cisco | |||
Email: pbrisset@cisco.com | Email: pbrisset@cisco.com | |||
Ali Sajassi | Ali Sajassi | |||
Cisco | Cisco | |||
Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
End of changes. 96 change blocks. | ||||
280 lines changed or deleted | 278 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |