Fault Management Mechanism in MPTCP Session

Introduction During data transmission in a MPTCP session, subflows may encounter some problems, for example, port failure on one endpoint, network failure, or middlebox working abnormally. Current MPTCP protocol does not provide exchanges between client and server when a fault happens on a subflow which will cause transmission failure or delay. RFC8684 introduces TCP RST Reason (MP_TCPRST) option to signal reasons for sending a RST on a subflow which can help an implementation decide whether to attempt later reconnection. TCP RST Reason (MP_TCPRST) option only reports the reason for a specific subflow that has been determined to be closed later. This solution does not cover the case of abnormal termination of one ongoing subflow.

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Fault Announce Exchanges This document proposes a fault announce mechanism with a new option that can be used to deliver failure information of abnormal subflow between client and server via another subflow in the MPTCP session that works properly. The flow is illustrated in Figure 1.

Client sends Fault Announce to server during a MPTCP Session | | | Determine that one ongoing subflow | is faulty | | | |-------Send Fault Announce option------->| | indicating suflow failure via | | another subflow | | | | | | | ]]> The Fault Announce option is carried on SYN, ACK or data packets. Client may detect a local fault, for example, local port or network card failure, or an error in local protocol processing. In this way, the client can determine the fault cause. Client may actively detect subflow failure by a detecting task to determine the fault cause. For example, the client may deploy a detection task using a bidirectional forwarding detection (BFD) to determine whether the subflow is faulty. Client may send an ICMP request to server and determine the exceptions by the duration of a response. Specifically, if the client cannot receive a response within a preset time, it means that this subflow is not working properly. Another way for client to determine the fault reason is ICMP error report. Client may receive an ICMP error report from a third device (e.g., middlebox on the faulty subflow), in which indicates the fault cause.

Fault Announce option A new Fault Announce option is defined to describe the fault in detail occurring on one subflow. If it is set, the faulty subflow is identified by its source address ID (SrcAddressID) and destination address ID (SrcAddressID). The mapping between IP addresses and addresses IDs should be created on both client and server through the process of ADD_ADDR defined in RFC8684 and RFC6824.

Option format The format of the Fault Announce option (FAULT_ANNOUNCE) is depicted in Figure 2:

Fault Announce (FAULT_ANNOUNCE) Option A new subtype should be allocated to indicate Fault Announce option. "Cause" is an 8-bit field to describe the reason code for which causes the subflow to malfunction. Client detects the fault and determines the cause. Following values (partially mapped to the Exception Code in ICMP error report) are defined in this document:

0x00~0x09 is reserved. It is compatible with "Reason" defined in RFC8684.
Network is unreachable (code 0x0A).
Host is unreachable (code 0x0B).
Routing is failed (code 0x0C).
Server Suppression (code 0x0D).
TTL equals zero (IP loops may occur) (code 0x0E).

"SrcAddressID" is used to identify source address ID for the faulty subflow. "DestAddressID" is used to identify destination address ID for the faulty subflow.

Additional requirements to be considered

Scenario of middlebox failure In some actual scenarios, it is the middlebox failure that causes blocking of one subflow. So client should report to server the information of the faulty middlebox by Fault Announce option so that the server can quickly locate it. The information of a faulty middlebox may include: Middlebox IP: The IP address of the faulty middlebox. IP protocol version: The IP protocol version adopted by the faulty middlebox, i.e. IPv4 or IPv6. Server can use it to parse the field of "Middlebox IP address". Flag ‘A’: If "Middlebox IP address" is optional, this flag should be defined to indicate whether the field of "Middlebox IP address" is carried in Fault Announce option.

Scenario of distinguishing fault types In some possible implementations, faults are classified into transient fault and non-transitory fault. So a field of "fault type" may be added to identify the type (transient fault or non-transitory fault) for subsequent processing.

IANA Considerations IANA is requested to assign a MPTCP option subtype for the Fault Announce option.

Security Considerations Fault Announce option is neither encrypted nor authenticated, so on-path attackers and middleboxes could remove, add or modify this option on observed Multipath TCP connections.