BGP Enabled ServiceS I. Malyushkin Internet-Draft Independent Contributor Intended status: Standards Track 17 August 2022 Expires: 18 February 2023 Abstract next-hop addresses in IP VPNs draft-malyushkin-bess-ip-vpn-abstract-next-hops-00 Abstract This document discusses the IP VPN convergence aspects and specifies procedures for IP VPN to signal the attachment circuit failure. The specified procedures help significantly improve the IP VPN convergence. About This Document This note is to be removed before publishing as an RFC. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-malyushkin-bess-ip-vpn- abstract-next-hops/. Discussion of this document takes place on the BGP Enabled ServiceS Working Group mailing list (mailto:bess@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/bess/. Subscribe at https://www.ietf.org/mailman/listinfo/bess/. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 18 February 2023. Malyushkin Expires 18 February 2023 [Page 1] Internet-Draft IP VPNs abstract next-hops August 2022 Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 3. Terminoly . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Solution Description . . . . . . . . . . . . . . . . . . . . 5 5. Abstract Next-Hop Address . . . . . . . . . . . . . . . . . . 8 5.1. Status of the Abstract Next-Hop . . . . . . . . . . . . . 9 5.2. Distribution of the Abstract Next-Hop . . . . . . . . . . 10 5.3. Tunnels to the Abstract Next-Hop . . . . . . . . . . . . 10 6. Distribution of VPN Routes . . . . . . . . . . . . . . . . . 12 7. Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . 13 8. Failure Detection . . . . . . . . . . . . . . . . . . . . . . 13 8.1. Egress PE . . . . . . . . . . . . . . . . . . . . . . . . 13 8.2. Ingress PE . . . . . . . . . . . . . . . . . . . . . . . 14 9. Deployment Considirations . . . . . . . . . . . . . . . . . . 14 9.1. Scalability . . . . . . . . . . . . . . . . . . . . . . . 14 9.2. Using the Abstract Next-Hops . . . . . . . . . . . . . . 15 9.3. Failure Detection . . . . . . . . . . . . . . . . . . . . 16 9.4. Routes Aggregation . . . . . . . . . . . . . . . . . . . 16 10. Multicast Considirations . . . . . . . . . . . . . . . . . . 17 11. Security Considerations . . . . . . . . . . . . . . . . . . . 17 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 13.1. Normative References . . . . . . . . . . . . . . . . . . 17 13.2. Informative References . . . . . . . . . . . . . . . . . 18 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 19 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 Malyushkin Expires 18 February 2023 [Page 2] Internet-Draft IP VPNs abstract next-hops August 2022 1. Introduction Neither IP VPN [RFC4364] nor IPv6 VPN [RFC4659] have a mass routes withdrawal mechanism. The failure of a connection to a CE forces a PE to withdraw all affected VPN routes instead of noticing other PE routers about the attachment circuit failure. These routes may be packed into one or more BGP UPDATE messages and then disseminated through the network. Depending on the BGP topology these messages may be further processed and replicated by intermediate nodes (e.g., route reflectors). In general, every affected route must be withdrawn from all interested parties. The number of failed routes impacts the convergence time. More routes require more time. A sophisticated intermediate BGP topology may also negatively affect this time. Network`s convergence speed is important. There is a potential traffic loss that lasts until the failure notification (BGP UPDATE messages) reaches other members participating in the affected VPN service (i.e., routers using the affected VPN routes for traffic forwarding). Moreover, this loss happens at the egress point where the failed CE router is connected to the network and after traffic has proceeded a whole path. There is a mechanism to avoid this traffic loss that acts while the network is converging which is named the BGP PIC edge [I-D.ietf-rtgwg-bgp-pic]. This mechanism depends on the availability of an extra exit point for every affected route. In case when the CE router is connected to a pair of PE routers (i.e., it is multihomed) and a link between the CE and one of these PE fails all affected traffic can be redirected by this PE toward another. On the other hand, the BGP PIC edge when it is active at egress is associated with the sub-optimal routing. Traffic from an ingress PE must follow the path toward the egress PE where the failed link with the CE is attached. Then this egress PE redirects traffic thanks to the pre- installed backup records toward another PE. Such a tromboning can negatively influence traffic characteristics (delay, loss rate, etc.). Another problem with the BGP PIC edge at egress is a possible routing loop. Suppose a CE router is connected to a pair of PE routers and contributes to them a set of routes. These PE install these routes and propagate them via internal BGP VPN sessions. Both PEs receive these routes via a PE-CE protocol and the internal BGP VPN sessions. The routes received via the internal BGP VPN sessions are used as backups for the routes received via the PE-CE protocol. When the CE fails the PE routers activate their backups sending traffic to each other until TTL reaches zero. Malyushkin Expires 18 February 2023 [Page 3] Internet-Draft IP VPNs abstract next-hops August 2022 The BGP PIC edge mechanism is a transient solution. As soon as all VPN members are notified about the unreachability of all affected VPN routes traffic will be sent to the extra exit point in an optimal way or it will be dropped at ingress. The goal of the solution described in this document is to decrease the time required by all VPN members to be aware of the failure thus reduce the time the BGP PIC edge lasts. This solution does not replace the BGP PIC edge and can be applied to networks in parallel with it. It is recommended to combine them together. Even if destinations that were advertised by a failed CE lack alternatives the time of the network reaction may be important. Imagine that the CE advertises a huge number of routes and attracts a considerable amount of traffic, but for some reason, these routes do not have an alternative exit point. Until all other members of the VPN service are aware of the failure traffic will flow through the network in vain. The solution described in this document will significantly reduce this time. This document refers to [RFC4364] in all cases when a logic of the latter is applicable to both address families. [RFC4659] is referred to explicitly if it introduces a new logic. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Terminoly The document uses the terminology defined in [RFC4271], [RFC4364], [RFC6513], and [RFC3031]. AC Attachment Circuit. GRT Global Routing Table. EC BGP Extended Community [RFC4360]. BGP VPN route Either a VPN-IPv4 or VPN-IPv6 route. Malyushkin Expires 18 February 2023 [Page 4] Internet-Draft IP VPNs abstract next-hops August 2022 Abstract Next-Hop (ANH) address An artificial IPv4 or IPv6 address in the GRT that represents an address of CE in VRF. Linked Address (LA) An actual IPv4 or IPv6 address of CE that is bound (linked) to ANH. ANH proxying A unidirectional dependency between the statuses of ANH and LA. 4. Solution Description Consider the topology in Figure 1. CE1 and CE2 maintain external BGP sessions with PE1 for IPv4 and IPv6 unicast address families. Both CEs send routes via these sessions, which must be reachable by CE3 through the VPN service. PE1 exports routes installed into VRF1 and send them as VPN routes to PE2. +-----+ | CE3 | +-----+ | | +-------------+ | | | PE2 | | 192.0.2.2 | +-------------+ | | +-------------+ | | +-----------+ | | | |.0 198.51.100.0/31 .1 +-----+ | | | VRF1 AC1------------------------------| CE1 | | | | |::1 FE80::/64 ::2 +-----+ | IP/MPLS |-----| PE1 | | network | | 192.0.2.1 | | | | |.2 198.51.100.2/31 .3 +-----+ | | | VRF1 AC2------------------------------| CE2 | | | | |::0 2001:DB8::/127 ::1 +-----+ | | +-----------+ +-------------+ Figure 1: IP/MPLS network with IP VPN Malyushkin Expires 18 February 2023 [Page 5] Internet-Draft IP VPNs abstract next-hops August 2022 Figure 2 shows the routes received from the CEs and installed into VRF1 by PE1 on the left and VPN routes advertised by PE1 on the right. The most interesting column here is "VPN Next-Hops". The address of 192.0.2.1 is a primary address of PE1 which is used as a default VPN next-hop address and as a source address of internal BGP sessions. All routes of CE2 use the default address of PE1 as the next-hop when exported as VPN routes. There is a special export policy on PE1 for the internal BGP sessions that modifies next-hops for the VPN-IPv4 routes of CE1 to the address of 192.0.2.100 and for the VPN-IPv6 routes of CE1 to the address of ::ffff:192.0.2.200. +-------------------+----------------+----------+ | VRF1 Routes | VRF1 Next-Hops | VRF1 ACs | +-------------------+----------------+----------+ +-------------------+----------------+----------+ | 203.0.113.0/25 | 198.51.100.1 | AC1 | +-------------------+----------------+----------+ | 203.0.113.128/25 | 198.51.100.3 | AC2 | +-------------------+----------------+----------+ +-------------------+----------------+----------+ | 2001:DB8:100::/64 | FE80::2 | AC1 | +-------------------+----------------+----------+ | 2001:DB8:200::/64 | 2001:DB8::1 | AC2 | +-------------------+----------------+----------+ | | v +---------------------+----------------------+-----------+ | VPN Routes | VPN Next-Hops | VPN Label | +---------------------+----------------------+-----------+ +---------------------+----------------------+-----------+ | RD:203.0.113.0/25 | 0:192.0.2.100 | 100 | +---------------------+----------------------+-----------+ | RD:203.0.113.128/25 | 0:192.0.2.1 | 100 | +---------------------+----------------------+-----------+ +---------------------+----------------------+-----------+ | RD:2001:DB8:100::/64| 0:::ffff:192.0.2.200 | 100 | +---------------------+----------------------+-----------+ | RD:2001:DB8:200::/64| 0:::ffff:192.0.2.1 | 100 | +---------------------+----------------------+-----------+ Figure 2: Export routes into VPN by PE1 PE1 advertises unicast host-specific routes for the addresses 192.0.2.1, 192.0.2.100, and 192.0.2.200 via a routing protocol. PE2 receives these routes and installs them into the GRT. PE1 also allocates an MPLS label for the addresses mentioned above and distributes the bindings of this label via a label distribution Malyushkin Expires 18 February 2023 [Page 6] Internet-Draft IP VPNs abstract next-hops August 2022 protocol. PE2 receives these bindings and installs them into its tunnel table. Thus, PE2 can resolve all VPN routes that PE1 has sent. Suppose that AC2 between PE1 and CE2 has failed for some reason. When PE1 has noticed the failure it invalidates all routes inside VRF1 that were used to reach the CE2 addresses, 198.51.100.2/31 and 2001:DB8::/127. Other routes that recursively uses the routes to addresses of CE2 for look up their next-hops become inactive too. Because of that PE1 starts withdrawing the corresponding VPN routes, 203.0.113.128/25 and 2001:DB8:200::/64 (RD is omitted). PE2 must wait for these withdrawals before it stops sending traffic toward PE1 (traffic from CE3 to CE2). In another scenario, AC1 fails instead of AC2. Imagine, PE1 is configured to monitor the status of AC1 and in the case of the failure of AC1 PE1 immediately updates the routing protocol and the label distribution protocol. These updates include the withdrawal of the unicast routes for the addresses 192.0.2.100 and 192.0.2.200 (but not for 192.0.2.1) and the label bindings for them. In parallel with it, PE1 proceeds with the similar steps described previously for the case with the AC2 failure. PE2 eventually receives the updates either by the routing protocol or the label distribution protocol or both. Thanks to a hierarchical FIB it invalidates all VPN routes at once that use the failed routes (and tunnels) to their VPN next-hops. PE2 stops sending traffic to PE1 (traffic from CE3 to CE1) even if it has not received yet any withdrawals for the corresponding VPN routes. For the sake of brevity, both scenarios are discussed without alternative exit points for the routes inside VRF1 and just for a couple of such routes. In real deployments, CE can distribute much more routes to more than one PE. In that case, the mechanism described above can significantly improve the network convergence times. This document introduces the mechanism that helps notify VPN members about the AC failure that has happened to one of these members. The described solution expects the following: * Hierarchical FIB MUST be supported and used among all VPN members. * Any VPN member acting as an ingress PE MUST consider the status of a unicast route in the GRT toward the BGP next-hop (BGP next-hop tracking). This status MUST be considered during the BGP route resolution and after the route is placed into the appropriate routing table. Malyushkin Expires 18 February 2023 [Page 7] Internet-Draft IP VPNs abstract next-hops August 2022 * It is RECOMMENDED for any VPN member acting as an ingress PE to consider the status of a tunnel toward the BGP next-hop also during the BGP VPN route resolution and after the route is placed into the appropriate routing table. The solution described in this document modifies the behavior of egress PE routers only and can be deployed incrementally. 5. Abstract Next-Hop Address Section 4.3.2 of [RFC4364] states: When a PE router distributes a VPN-IPv4 route via BGP, it uses its own address as the "BGP next hop". In most cases, the "own address" is the address of a virtual interface (e.g., a loopback). This address usually acts as a tunnel endpoint for labeled traffic. The tunnel using it may be instantiated by different mechanisms and must be capable of forwarding MPLS traffic. The PE also uses this address as a source address of internal BGP sessions. Due to a virtual nature of the interface owning this address, it is nearly impossible to face the failure of this interface (except for artificial ways). Only the failure of a whole PE leads to it. The solution described in the document proposes using additional next-hops for VPN routes advertised by a single PE. This alters a behavior described in [RFC4364] and [RFC4659] that presupposes the advertising of a single next-hop address for all VPN routes of a PE. With regard to the described solution, additional next-hops advertised by a PE are named ANH addresses. An ANH address is an artificial IPv4 or IPv6 address that belongs to the GRT. An ANH acts as a proxy address for an actual address of a CE residing in a VRF. The status of the latter address influences the status of its ANH. A CE`s address selected for an ANH is named an LA. An LA may belong to a common subnet of a PE-CE pair in a VRF or can be any other address of the CE, it cannot belong to the GRT. An ANH and LA pair does not necessarily belong to the same address family. For example, it is possible to have an ANH of IPv4 and an LA of its ANH of IPv6, and vice versa. An LA can be a link-local IPv6 address, in this case, its ANH MUST be a proxy to a triplet (LA, AC, VRF) instead of (LA, VRF) where the LA belongs to the AC from the triplet. Malyushkin Expires 18 February 2023 [Page 8] Internet-Draft IP VPNs abstract next-hops August 2022 Addresses installed in different VRFs may be overlapped. Thus, values of ANHs may be arbitrary and do not have to be the same as their LAs. An operator is free to choose these values according to network address plans. To achieve goals stated in Section 1 values of ANHs MUST be unique throughout the GRTs of the network. The case when several PE routers advertise the same value for the ANH (e.g., anycast) is out of the scope of this document. The ANH proxying is not a route leaking mechanism, it cannot be used for traffic forwarding between the GRT and a VRF in any direction. The ANH proxying creates a dependency between the statuses of an ANH and an LA (Section 5.1). An ANH MUST be bound to only one LA. An LA in turn MUST be bound to only one ANH. There is a strict one-to-one mapping between them. An operator may create the ANH proxying for any address in a VRF, but the solution expects that this address is used as a next-hop for routes in this VRF. These routes does not necessarily belong to the same CE that owns the LA (i.e., third-party next-hop). The ANH proxying can be described as a static host-specific route that is installed in the GRT. A destination address of this route is configured as a value selected for an ANH by an operator. A next-hop address of the static route is equal to an LA (selected for the ANH). Additionally, the operator configures a VRF (its name or index) directly for this static route. It points to where the next-hop of the route must be resolved. For unlabeled traffic coming to a PE via the GRT, the static route acts as a route to the bit bucket. This document does not restrict implementations by this mechanic. 5.1. Status of the Abstract Next-Hop The status of an ANH depends on the existence of a route to its LA. This route MUST be present in a VRF associated with the ANH. The ANH is considered active if and only if the route to its LA is active and is available for traffic forwarding. The proposed solution does not restrict the type of this route, but it MUST support at least direct routes and static routes. An implementation MAY filter the protocols used for resolution of LAs by a configuration policy. The status of an ANH is unidirectional, only the status of an LA defines the status of an ANH. An implementation MAY support the option of deactivation of an ANH manually by an operator. Malyushkin Expires 18 February 2023 [Page 9] Internet-Draft IP VPNs abstract next-hops August 2022 Besides the dependency on a route toward an LA, an ANH MAY be a client of any mechanism of active monitoring of the LA. It can be any next-hop tracking (ARP, ICMP probes if they are applicable to the LA) or a BFD [RFC5880] session between a CE that owns the LA and a PE that owns the ANH. 5.2. Distribution of the Abstract Next-Hop In general, the distribution of ANHs by means of a routing protocol does not differ from the distribution of any other addresses that are considered to be BGP next-hops. An ANH SHOULD be advertised by a routing protocol. In this case, the next conditions MUST be met: * The status of the ANH is active (Section 5.1). * The ANH is advertised as a host-specific route. * This route is reachable by at least a subset of PEs via the routing protocol. * These PEs import VPN routes with a BGP next-hop address equal to the ANH (covered by the route) in appropriate VRFs and use these VPN routes for traffic forwarding. This solution does not restrict the type of the routing protocol for ANH routes distribution. When the status of an ANH changes from active to inactive a PE MUST notify the other PEs receiving a route to this ANH. The speed of origination of such notification and its propagation is crucial. PEs that received a route to an ANH act according to the standard procedures that are applicable to the routing protocol. This solution does not modify this behavior. 5.3. Tunnels to the Abstract Next-Hop A PE may have one or several ANHs and distributes them as per Section 5.2. In that case, according to Section 5 of [RFC4364] there MUST be a tunnel for every such address of the PE. Malyushkin Expires 18 February 2023 [Page 10] Internet-Draft IP VPNs abstract next-hops August 2022 This solution does not restrict the type of tunnels that point to ANHs, but these tunnels MUST forward MPLS traffic. However, an implementation of an egress tunnel endpoint may require some changes to support a point-to-point tunnel (e.g., RSVP-TE LSP [RFC3209] or IP GRE [RFC4032]) to an ANH. These changes are out of the scope of this document. The solution does not consider in detail using tunneling technologies other than MPLS LSPs for transport of labeled VPN traffic. The rest of the section is applicable to MPLS LSPs only. For all ANHs with LAs that belong to the same VRF, a PE MUST allocate the same label. A PE MAY allocate a single label for all ANHs (e.g., implicit label). When a PE allocates a label for an ANH it MUST associate a release timer with this label. If the status of the ANH changes to inactive the PE starts the release timer for the label. While the timer is active if the PE is receiving traffic with this label it MUST continue to handle this traffic like the failure has not happened. When the timer reaches zero the PE starts freeing the resources associated with the label. This timer does not influence the generation and advertising of the failure notification via the label distribution protocol. An implementation SHOULD support a manual setting of the release timer (including zero). If a label is allocated for a group of ANHs a PE starts the timer if and only if the last active address of the group becomes inactive. When a PE advertises a label binding to an ANH it either MUST be accomplished by a label distribution protocol in parallel with the advertising of the ANH via a routing protocol (Section 5.2), or the ANH MUST be sent as a labeled route (e.g., BGP-LU [RFC8277]). When the status of an ANH (Section 5.1) changes to inactive and a label binding to this ANH was advertised by a PE via the label distribution protocol the PE MUST notify other routers receiving the label for this ANH. The speed of origination of such notification and its propagation is also important, this notification may be received before the notification via the routing protocol, or it may be the only notification channel (Section 9.4). Routers that received a label binding for an ANH act according to the standard procedures that are applicable to the label distribution protocol. The proposed solution does not change the behavior of ingress LERs or LSRs. Malyushkin Expires 18 February 2023 [Page 11] Internet-Draft IP VPNs abstract next-hops August 2022 6. Distribution of VPN Routes The solution that is proposed in this document is only applicable to VPN-IPv4 and VPN-IPv6 routes (i.e., SAFI 128). Using any other routes with ANHs is out of the scope of this document. For a group of routes installed in a VRF and united by a common next- hop address, an operator MAY set up an ANH as a next-hop of the corresponding VPN routes. The rest routes from the same VRF (if they are left) MUST be advertised by procedures [RFC4364] or [RFC4659] if it is supposed to advertise them. An ANH for VPN-IPv4 routes is encoded according to Section 4.3.2 of [RFC4364] as a VPN-IPv4 address with an RD of 0. An ANH for VPN-IPv6 routes is encoded according to Section 3.2.1 of [RFC4659] as a VPN-IPv6 address. This VPN-IPv6 address contains an RD of 0 and an IPv6 address which is equal to the ANH. In case when the ANH is the IPv4 address the VPN-IPv6 address is encoded as an IPv4-mapped IPv6 address. The procedures of including a link-local address are not altered by this solution. According to Section 4.3 of [RFC4364], routes that are installed in a VRF are converted to VPN routes (this statement is applicable to both address families), and "exported" to BGP. This solution assumes that all VPN routes are installed into the VPN Loc-RIB with a next-hop address that is equal to the own address of a PE where this VRF is configured. In the other words, the solution does not modify procedures for converting routes from VRFs to VPN routes. All routes in a Loc-RIB are processed into appropriate Adj-RIBs-Out according to configured policies [RFC4271], Section 9.1.3. The solution expects that there MUST be a special export policy that is applicable to routes undergoing from the VPN Loc-RIB to VPN Adj-RIBs- Out and is processed in a chain before all policies that are configured by an operator (if there are such policies). This special export policy modifies next-hop addresses only for those routes that are supposed by a configuration to be sent with ANHs (or a single ANH). Malyushkin Expires 18 February 2023 [Page 12] Internet-Draft IP VPNs abstract next-hops August 2022 A PE does not check the presence of a route to an ANH in the GRT before copying VPN routes from a Loc-RIB into a corresponding Adj- RIB-Out and during the Update-send process (Section 9.2 of [RFC4271]). When the status of an ANH (Section 5.1) changes to inactive a PE does not start withdrawing VPN routes that use this ANH as their next-hop. It prevents churn in the case when an operator decides to maintain a network and manually disable the ANH. On the other hand, deleting a binding between an ANH and its LA MUST start changing the corresponding next-hop addresses in Adj-RIBs-Out to the default value (the value from the Loc-RIB). An implementation MAY support an option of selecting distinct Adj- RIBs-Out where VPN routes will be placed with ANHs. 7. Forwarding For an ingress PE, it is impossible to determine whether a next-hop address of received VPN routes is a regular address or an abstract one. The ingress PE considers every VPN next-hop address as the address of a standalone egress router even if a group of VPN next-hop addresses belongs to the same device. Having an active route and a tunnel to a BGP next-hop address the ingress PE encapsulates and sends traffic via the tunnel according to Section 5 of [RFC4364]. If an egress PE receives MPLS traffic with a label that was allocated for one of its ANHs the solution expects the following (other cases are out of the scope of this document): * This label MUST NOT contain a Bottom of Stack bit [RFC3032] is set. * At a bottom of the received stack there MUST be a label that was allocated by the egress PE. * There MAY be other labels between these two labels (e.g., entropy labels [RFC6790]). 8. Failure Detection 8.1. Egress PE A PE detects the failure of a connected CE by different mechanisms. These mechanisms are not considered in this document. The net effect of the failure is the unreachability of routes to addresses (or a route to a single address) of the failed CE in a VRF where an AC to the CE resides. The PE usually uses these routes to recursively resolve next-hops for other routes in the VRF (are also usually distributed by the CE). All failed routes try to find new options to Malyushkin Expires 18 February 2023 [Page 13] Internet-Draft IP VPNs abstract next-hops August 2022 resolve their next-hops, if there are no such options the PE starts deleting the failed routes from the VRF. After the PE detected the CE had failed and if one of addresses of the CE is an LA the PE immediately deactivates an ANH of this LA. If a route to the ANH was distributed as per Section 5.2 the PE notifies all neighbors of a routing protocol. If the ANH was also bound to a label and this label was distributed via a label distribution protocol, the PE notifies all neighbors of the label distribution protocol. The PE may start distributing updates via BGP VPN sessions notifying its peers that the routes in the VRF are no longer reachable. This process does not relate to the process described above and the solution does not modify it. However, if the route to ANH was distributed by BGP via the same set of sessions that are used for VPN routes distribution an implementation SHOULD schedule sending of the UPDATE message with the ANH`s withdrawal prior to UPDATE messages with the failed VPN routes. 8.2. Ingress PE As stated in Section 4, the proposed solution expects an ingress PE to consider the status of a tunnel toward a BGP VPN next-hop. Thus, when the status of the tunnel changes to inactive the ingress PE simultaneously deactivates all VPN routes with a next-hop equal to an address of the tunnel`s endpoint. If the ingress PE does not follow this logic, the solution expects that the status of a route toward a BGP VPN next-hop in the GRT is used the same way. The ingress PE can apply both procedures. In any case, the ingress PE can react to the failure of a remote CE (the CE connected to a remote PE in the same VPN) or an AC to the CE independently of the receiving of BGP UPDATE messages that withdraw VPN routes pointing to this CE. The ingress PE may activate backups for these routes and redirect traffic by them. 9. Deployment Considirations 9.1. Scalability A requirement to have a tunnel to every next-hop address that a PE uses to advertise VPN routes may pose scalability concerns. There are some thoughts on how to deploy the described solution from the scalability point of view: Malyushkin Expires 18 February 2023 [Page 14] Internet-Draft IP VPNs abstract next-hops August 2022 * Use multipoint-to-point tunnels to reach VPN next-hop addresses. It helps to reduce state on a PE that advertises VPN routes with these next-hops avoiding dependence on the number of upstream routers. It also reduces state on intermediate routers when the tunnels are LSPs. * Selective installation of routes and tunnels helps spend resources only for VPN next-hop addresses that an ingress PE will use. If the ingress PE does not import a VPN route it is not always necessary to have a tunnel towards a next-hop address of this route. In the case of ANHs, having such tunnels is even more questionable. * Do not use the ANH proxying for every possible CE address in every possible VRF. There are a lot of cases when the traditional VPN convergence is good enough. * A PE may advertise a common label for all its ANHs if tunnels towards them are supposed to be LSPs. In this case, it is expected that the ANHs also share a common value for the release timer (Section 5.3). Using different values for the release timer may require to deaggregate labels. 9.2. Using the Abstract Next-Hops Deploying of the ANH solution should be considered on a per-service basis. The following points may help to decide whether an ANH is appropriate: * Most VPN services exchange a small number of routes, hundreds, or several thousand. Usually, any CE contributes a small portion of them, and its failure can be noticed and repaired by the network in a reasonable time. Imagine a situation when a CE advertises a significant number of routes, tens of thousands or more. In this case, the restoration after the failure of the CE may exceed the expected convergence time. * Independently on the number of routes that are advertised by a CE existence of an extra exit point should be considered. Improving the switchover time requires a point where to switch. * Sometimes it may be desirable to stop traffic flowing through a network after a destination CE fails, even if there is no extra exit point. For example, there is a cosiderable amount of traffic towards a failed CE. * Some services require special treatment due to stricter SLAs, using ANH may help to achieve these SLAs. Malyushkin Expires 18 February 2023 [Page 15] Internet-Draft IP VPNs abstract next-hops August 2022 9.3. Failure Detection It is worth noting the detection time of a CE failure or a failure of link (or an AC) to the CE. This time contributes much to overall convergence. For example, sometimes it is not possible to notice the link failure by a loss of a signal, and extra mechanisms are required for this task. Some of these mechanisms interact with a session of a PE-CE protocol. When these mechanisms detect the failure, routes distributed by an associated PE-CE protocol`s session will become inactive. At the same time routes toward addresses of the CE (or a single route) are usually not distributed by the PE-CE protocol`s session. They may be direct routes or statics. Thus, the routes toward the addresses of the CE are not affected by the detection mechanism described above and are staying alive. If one of the CE`s addresses is an LA the status of an ANH that is proxying to the LA will also be active. To prevent such behavior, it is recommended to use detection of the failure of an address of the CE or a route to this address (both on the PE and the CE, especially if the CE is multihomed). In the other words, it is better (in the described case) to monitor a next-hop for the routes distributed by the PE-CE protocol, but not the session of the PE-CE protocol. 9.4. Routes Aggregation Routes advertised by a routing protocol can be aggregated at some points of a network (e.g., ABRs). Such aggregation may lead to obscurity issues in the event of an ANH deactivation. The aggregation of routes removes a notification channel that is supplied by the routing protocol. A label distribution protocol can provide this notification channel when it is used for the distribution of labels for ANHs. But in this case, it requires all VPN members to consider the status of tunnels toward ANHs as described in Section 4. If LDP [RFC5036] is used as the label distribution protocol for ANHs, the following steps should be considered: * LDP extension for Inter-Area LSPs [RFC5283] must be used through the network. * If LDP is configured in Downstream Unsolicited mode it must also be configured in Ordered Control mode. LDP LSR with Independent Control mode on a path to ANH will break the notification channel. Malyushkin Expires 18 February 2023 [Page 16] Internet-Draft IP VPNs abstract next-hops August 2022 10. Multicast Considirations Section 7 of [RFC6514] introduces the VRF Route Import EC. Section 5.1.3 of [RFC6513] describes a scenario when unicast VPN routes do not contain this community during the selection of Upstream PE: If a route does not have a VRF Route Import Extended Community, the route's Upstream PE is determined from the route's BGP Next Hop. The solution described in this document expects that unicast VPN routes of a VPN service may be sent by a PE with different BGP next- hop addresses. It may create an issue with the importing of C-Multicast routes if this VPN service also acts as MVPN and does not mark its VPN routes with the VRF Route Import EC. It is not recommended to configure a new import Route Target EC for every ANH. Instead, there are two possible ways to mitigate the described problem: * Using procedures described in Section 10 of [RFC6514]. * Do not use ANHs for unicast VPN routes that cover multicast sources or RP addresses. 11. Security Considerations This document specifies extensions for the advertisement of VPN routes with different next-hops by a signle PE. From this point of view, the security considirations described in [RFC4364] and [RFC4659] are equally applicable for the extensions described in this document. 12. IANA Considerations This document has no IANA actions. 13. References 13.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . Malyushkin Expires 18 February 2023 [Page 17] Internet-Draft IP VPNs abstract next-hops August 2022 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, DOI 10.17487/RFC3031, January 2001, . [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, . [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, . [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 2006, . [RFC4659] De Clercq, J., Ooms, D., Carugi, M., and F. Le Faucheur, "BGP-MPLS IP Virtual Private Network (VPN) Extension for IPv6 VPN", RFC 4659, DOI 10.17487/RFC4659, September 2006, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 13.2. Informative References [I-D.ietf-rtgwg-bgp-pic] Bashandy, A., Filsfils, C., and P. Mohapatra, "BGP Prefix Independent Convergence", Work in Progress, Internet- Draft, draft-ietf-rtgwg-bgp-pic-18, 9 April 2022, . [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001, . [RFC4032] Camarillo, G. and P. Kyzivat, "Update to the Session Initiation Protocol (SIP) Preconditions Framework", RFC 4032, DOI 10.17487/RFC4032, March 2005, . Malyushkin Expires 18 February 2023 [Page 18] Internet-Draft IP VPNs abstract next-hops August 2022 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February 2006, . [RFC5036] Andersson, L., Ed., Minei, I., Ed., and B. Thomas, Ed., "LDP Specification", RFC 5036, DOI 10.17487/RFC5036, October 2007, . [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension for Inter-Area Label Switched Paths (LSPs)", RFC 5283, DOI 10.17487/RFC5283, July 2008, . [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, . [RFC6513] Rosen, E., Ed. and R. Aggarwal, Ed., "Multicast in MPLS/ BGP IP VPNs", RFC 6513, DOI 10.17487/RFC6513, February 2012, . [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, . [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and L. Yong, "The Use of Entropy Labels in MPLS Forwarding", RFC 6790, DOI 10.17487/RFC6790, November 2012, . [RFC8277] Rosen, E., "Using BGP to Bind MPLS Labels to Address Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017, . Acknowledgments The author would like to thank Roman Peshekhonov for his review and valuable input. Author's Address I. Malyushkin Independent Contributor Email: gmalyushkin@gmail.com Malyushkin Expires 18 February 2023 [Page 19]