Network Working Group H. Robinson Internet-Draft Stratus Technologies, Inc. Intended status: Standards Track 26 May 2022 Expires: 27 November 2022 Multiple Core Performance Hint Option draft-robinson-intarea-mcphint-00 Abstract This standard defines a method for differentiating between unrelated data streams when the source and destination ports are encrypted. This method MAY be used by hardware or software to evenly distribute incoming workload between multiple CPU cores and/or other processing elements. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 27 November 2022. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Robinson Expires 27 November 2022 [Page 1] Internet-Draft MCPHINT May 2022 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. IPv4 Option Format . . . . . . . . . . . . . . . . . . . . . 3 3. IPv6 Option Format . . . . . . . . . . . . . . . . . . . . . 4 4. Differentiation Data . . . . . . . . . . . . . . . . . . . . 5 5. Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . 5 6. Tunneling . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7. Parsing Input Datagrams . . . . . . . . . . . . . . . . . . . 5 7.1. IPv4 . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7.2. IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . 6 8. Future Considerations . . . . . . . . . . . . . . . . . . . . 6 9. Security Considerations . . . . . . . . . . . . . . . . . . . 6 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 11. Appendix A - Design Considerations . . . . . . . . . . . . . 7 11.1. IP Nofification . . . . . . . . . . . . . . . . . . . . 7 11.2. Issues To Resolve . . . . . . . . . . . . . . . . . . . 7 12. Normative References . . . . . . . . . . . . . . . . . . . . 8 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction The Internet protocol allows datagrams to be re-ordered. Protocols which require datagrams to be ordered must retain out of order datagams until preceding datagrams have been received. While this works, the effect of out of order datagrams on network performance is highly detrimental: Out of order packets at first appear to be packet loss from the receivers point of view. The perceived packegt loss can trigger unneeded retransmission and delays from TCP and any other protocol which uses packet loss to implement congestion control. With the advent of 10Gbit transmission speeds, it is not possible for a single CPU core to keep up with the incoming data running at full line speed. Hardware vendors have implemented mechanisms to distribute incoming datagrams to multiple CPU cores. If they did this on a random or round-robin basis, the different latencies between the multiple cores would result in datagram re-ordering, which can severly impact performance. Hardware solves this problem by distributing the data deterministically between CPU cores: This is done using a hash of the source and destination IP addresses and the source and destination port numbers. Using just the source and destination IP addresses is not sufficient, because the resulting traffic will often go to a single CPU core. A performance problem arises when handling IPSec traffic: The port numbers are encrypted and can no longer be read by the hardware. Robinson Expires 27 November 2022 [Page 2] Internet-Draft MCPHINT May 2022 The performance problem also occurs with fragmented datagrams: The port numbers are only in the first fragment. This standard defines IPv4 and IPv6 options to provide differentiation that can be used to distribute incoming datagrams to multiple CPU cores. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. 2. IPv4 Option Format A host transmitting an IPv4 datagram MAY add an MCPHINT option to the IPv4 header under any of the following circumstances: * The datagram contains an AH or ESP header. * The datagram is fragmented. * The datagram is to be transmitted beyond the current subnet and the don't fragment bit is not set. The MCPHINT option provides 2 bytes of differentiation data. If present, the MCPHINT option MUST occur first - at offset 20 from the beginning of the IPv4 header. The MCPHINT option MUST NOT be used with upper layer protocols which do not have unique identifiers beyond the IPv4 source and destination address. The datagram MUST NOT be for the ICMP protocol. The format of the IPv4 MCPHINT options is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Length = 4 | Differentiation Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type = TBD_IP4OPT_MCPHINT Refer to RFC0791 [RFC0791] for more information about IP options. Robinson Expires 27 November 2022 [Page 3] Internet-Draft MCPHINT May 2022 If there is a mechanism by which an application can provide IPv4 options for transmission and that mechanism is used to provide an MCPHINT option, the value provided by the application MUST be used. The macro OPT_MCPHINT MAY be added to netinet/in.h defined as TBD_IP4OPT_MCPHINT. 3. IPv6 Option Format A host transmitting an IPv6 datagram MAY add an MCPHINT option under any of the following circumstances: * The datagram contains an AH or ESP header. * The datagram will be fragmented. The MCPHINT option MUST be added to a destination options header. The MCPHINT option provides 2 bytes of differentiation data. The Destination options header is defined in section 4.6 of RFC8200 [RFC8200]. If present, the MCPHINT option MUST occur first in the first destination options header - normally at offset 42 from the beginning of the IPv6 header. Note that RFC8200 [RFC8200] requires that per fragment destination headers to be followed by a routing header. If one applies this hint to a packet containing an IPv6 fragmentation header, a routing header must be included. RFC8200 [RFC8200] explicitly states that a routing header with zero "Segments Left" is always ignored; so, this is possible. The MCPHINT option MUST NOT be used with upper layer protocols which do not have unique identifiers beyond the IPv6 source and destination address. The datagram MUST NOT be for the ICMP6 protocol. The format of the IPv6 MCPHINT options is: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Data Len = 2 | Differentiation Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type = TBD_IP6OPT_MCPHINT Robinson Expires 27 November 2022 [Page 4] Internet-Draft MCPHINT May 2022 If there is a mechanism by which an application can provide destination options for transmission and that mechanism is used to provide an MCPHINT option, the value provided by the application MUST be used. The macro IP6OPT_MCPHINT MAY be added to netinet/ip6.h defined as TBD_IP6OPT_MCPHINT. 4. Differentiation Data For both IPv4 and IPv6, there is two bytes of differentiation data. The differentiation data MUST NOT be zero. The differentation data MUST be the same for all datagrams in a logical stream. The actual value chosen for differentiation data is left to the implementation. A preferable mechanism would be to generate two bytes of random data when a socket is created and to use that data for the life of the socket. The random data could be updated every time a connection is specified. Alternatively, exclusive or'ing the source and destination ports is an acceptable method for generating the differentiation data. 5. Forwarding Forwarding is already defined to pass through unknown options. 6. Tunneling Tunneling implementations MAY copy the MCPHINT option from the datagrams being tunneled to the outer headers. 7. Parsing Input Datagrams 7.1. IPv4 Refer to section 3.1 in RFC0791 [RFC0791]. The input parsing algorithm for detecting the presence of differentiation data is o IHL MUST be greater than or equal to 6 o The byte at offset 20 MUST be TBD_IP4OPT_MCPHINT If those checks pass, then the differentation data can be found at offset 22. Robinson Expires 27 November 2022 [Page 5] Internet-Draft MCPHINT May 2022 7.2. IPv6 Refer to sections 3, 4.2 and 4.6 in RFC8200 [RFC8200]. The input parsing algorithm for detecting the presence of differentiation data is o Next Header (offset 6) MUST be 60 (for destination options). o The byte at offset 42 MUST be TBD_IP6OPT_MCPHINT If those checks pass, then the differentation data can be found at offset 44. 8. Future Considerations A future revision of this standard could allow the differentation data to be longer as long as the first two bytes are generated the same way. A future revision of this standard could add fields to this option. 9. Security Considerations The MCPHINT option provides some minimal insight to internal network configurations that wouldn't otherwise be discernable for IPSec tunnels. Xor'ing the port numbers to obtain differentiation data provides slightly more information than using random data. The implementation MUST provide an adminitrative mechanism to disable the use of MCPHINT options. If the implementation implements both random generation of differentiation data AND uses the Xor'ing ports method, there MUST be separate administrative mechanisms for each method. 10. IANA Considerations IANA is asked to assign a value for TBD_IP4OPT_MCPHINT under "Internet Protocol Version 4 (IPv4) Parameters", "IP Option Numbers" (https://www.iana.org/assignments/ip-parameters/ip- parameters.xhtml#ip-parameters-1), Refer to RFC2780 [RFC2780] and RFC0791 [RFC0791]. The Copy bit MUST be 1 and the class bits MUST be 00. Robinson Expires 27 November 2022 [Page 6] Internet-Draft MCPHINT May 2022 IANA is asked to assign a value for TBD_IP6OPT_MCPHINT under "Internet Protocol Version 6 (IPv6) Parameters", "Destination Options and Hop-by-Hop Options" (https://www.iana.org/assignments/ipv6- parameters/ipv6-parameters.xhtml#ipv6-parameters-2), Refer to RFC2780 [RFC2780] and RFC8200 [RFC8200]. The act bits MUST 00 and the chg bit MUST be 0 11. Appendix A - Design Considerations This is done as an option so it may be added without affecting implementations that don't implement it. Use with ICMP and ICMPv6 is prohibited because there is no reason to optimize them and, given that correct IP layer behavior depends on thier transmission, it is best to avoid anything that might interfere with correct operation.. One should note that when using this option with IPSec, the same security association is likely to be processed on multiple CPU cores. This requires a good locking design to acheive the desired performance improvement. It also requires much larger replay windows. 11.1. IP Nofification Stratus has applied for a patent on this. Stratus intends to allow use of the patent free of charge. I will be filing the appropriate formal notification as soon as I figure out what it is and get it signed by the appropriate management. 11.2. Issues To Resolve My original writeup of this put the new IPv6 option in the Hop-by-Hop header, because that is always ensured to be a per fragment header. The option was moved to the destination options header given the advice in section 4.8 of RFC8200 Section 4.5 of RFC8200 [RFC8200].[RFC8200] explicitly states that there are only the following combinations of per fragment headers: IPv6 Header IPv6 Headar, Hop-by-Hop Header IPv6 Header, Destination Options Header, Routing Header IPv6 Header, Hop-by-Hop Header, Dest Options Header, Routing Header This implies that getting MCPHINTs into a fragmented header will require the insertion of a null routing header if one isn't present (which is the normal case). Robinson Expires 27 November 2022 [Page 7] Internet-Draft MCPHINT May 2022 So, I am wondering if I was mislead by section 4.8 in RFC8200 [RFC8200] and this option really belongs in the hop-by-hop header? I see that some other drafts have picked new values for option numbers and instructed the IANA to allocate specific numbers. I like this idea. Can anyone recommend deprected values which could be assigned without getting into trouble? 12. Normative References [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, DOI 10.17487/RFC0791, September 1981, . [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, DOI 10.17487/RFC8200, July 2017, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2780] Bradner, S. and V. Paxson, "IANA Allocation Guidelines For Values In the Internet Protocol and Related Headers", BCP 37, RFC 2780, DOI 10.17487/RFC2780, March 2000, . Author's Address Herb Robinson Stratus Technologies, Inc. 5 Mill & Main Place, Suite 500 Maynard, Massachusetts 1004 United States of America Email: Herbie.Robinson@stratus.com URI: https://www.stratus.com/ Robinson Expires 27 November 2022 [Page 8]