RFC 9333 | Minimal IP ESP | December 2022 |
Migault & Guggemos | Informational | [Page] |
This document describes the minimal properties that an IP Encapsulating Security Payload (ESP) implementation needs to meet to remain interoperable with the standard ESP as defined in RFC 4303. Such a minimal version of ESP is not intended to become a replacement of ESP in RFC 4303. Instead, a minimal implementation is expected to be optimized for constrained environments while remaining interoperable with implementations of ESP. In addition, this document provides some considerations for implementing minimal ESP in a constrained environment, such as limiting the number of flash writes, handling frequent wakeup and sleep states, limiting wakeup time, and reducing the use of random generation.¶
This document does not update or modify RFC 4303. It provides a compact description of how to implement the minimal version of that protocol. RFC 4303 remains the authoritative description.¶
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9333.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
ESP [RFC4303] is part of the IPsec protocol suite [RFC4301]. IPsec is used to provide confidentiality, data origin authentication, connectionless integrity, an anti-replay service, and limited Traffic Flow Confidentiality (TFC) padding.¶
Figure 1 describes an ESP packet. Currently, ESP is implemented in the kernel of most major multipurpose Operating Systems (OSes). ESP is usually implemented with all of its features to fit the multipurpose usage of these OSes, at the expense of resources and with no considerations for code size. Constrained devices are likely to have their own implementation of ESP optimized and adapted to their specific use, such as limiting the number of flash writes (for each packet or across wake time), handling frequent wakeup and sleep states, limiting wakeup time, and reducing the use of random generation. With the adoption of IPsec by Internet of Things (IoT) devices with minimal IKEv2 [RFC7815] and ESP Header Compression (EHC) [EHC-DIET-ESP] [EHC-IKEv2], these ESP implementations MUST remain interoperable with standard ESP implementations. This document describes the minimal properties an ESP implementation needs to meet to remain interoperable with ESP [RFC4303]. In addition, this document provides advice to implementers for implementing ESP within constrained environments. This document does not update or modify [RFC4303].¶
For each field of the ESP packet represented in Figure 1, this document provides recommendations and guidance for minimal implementations. The primary purpose of minimal ESP is to remain interoperable with other nodes implementing ESP [RFC4303], while limiting the standard complexity of the implementation.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
[RFC4303] defines the SPI as a mandatory 32-bit field.¶
The SPI has local significance to index the Security Association (SA). As described in Section 4.1 of [RFC4301], nodes supporting only unicast communications can index their SA using only the SPI. Nodes supporting multicast communications also require the use of IP addresses; thus, SA lookup needs to be performed using the longest match.¶
For nodes supporting only unicast communications, indexing the SA using only the SPI is RECOMMENDED. The index may be based on the full 32 bits of the SPI or a subset of these bits. The node may require a combination of the SPI as well as other parameters (like the IP address) to index the SA.¶
Values 0-255 MUST NOT be used. As per Section 2.1 of [RFC4303], values 1-255 are reserved, and 0 is only allowed to be used internally and MUST NOT be sent over the wire.¶
[RFC4303] does not require the 32-bit SPI to be randomly generated, although that is the RECOMMENDED way to generate SPIs as it provides some privacy and security benefits and avoids correlation between ESP communications. To obtain a usable random 32-bit SPI, the node generates a random 32-bit value and checks it does not fall within the 0-255 range. If the SPI has an acceptable value, it is used to index the inbound session. Otherwise, the generated value is discarded, and the process repeats until a valid value is found.¶
Some constrained devices are less concerned with the privacy properties associated with randomly generated SPIs. Examples of such devices might include sensors looking to reduce their code complexity. The use of a predictive function to generate the SPI might be preferred over the generation and handling of random values. An implementation of such predictable function could use the combination of a fixed value and the memory address of the Security Association Database (SAD) structure. For every incoming packet, the node will be able to point to the SAD structure directly from the SPI value. This avoids having a separate and additional binding and lookup function for the SPI to its SAD entry for every incoming packet.¶
SPIs that are not randomly generated over 32 bits may have privacy and security concerns. As a result, the use of alternative designs requires careful security and privacy reviews. This section provides some considerations for the adoption of alternative designs.¶
The SPI value is only looked up for inbound traffic. The SPI negotiated with IKEv2 [RFC7296] or minimal IKEv2 [RFC7815] by a peer is the value used by the remote peer when it sends traffic. The main advantage of using a rekeying mechanism is to enable a rekey, which is performed by replacing an old SA with a new SA, both indexed with distinct SPIs. The SPI is only used for inbound traffic by the peer, which allows each peer to manage the set of SPIs used for its inbound traffic. The necessary number of SPIs reflects the number of inbound SAs as well as the ability to rekey those SAs. Typically, rekeying an SA is performed by creating a new SA (with a dedicated SPI) before the old SA is deleted. This results in an additional SA and the need to support an additional SPI. Similarly, the privacy concerns associated with the generation of non-random SPIs is also limited to the incoming traffic.¶
Alternatively, some constrained devices will not implement IKEv2 or minimal IKEv2 and, as such, will not be able to manage a rollover between two distinct SAs. In addition, some of these constrained devices are likely to have a limited number of SAs; for example, they are likely to be indexed over 3 bytes only. One possible way to enable a rekeying mechanism with these devices is to use the SPI where, for example, the first 3 bytes designates the SA while the remaining byte indicates a rekey index. SPI numbers can be used to implement tracking the inbound SAs when rekeying is taking place. When rekeying an SPI, the new SPI could use the SPI bytes to indicate the rekeying index.¶
The use of a small, limited set of SPI numbers across communications comes with privacy and security concerns. Some specific values or subsets of SPI values could reveal the model or manufacturer of the node implementing ESP or reveal a state such as "not yet rekeyed" or "rekeyed 10 times". If a constrained host uses a very limited number of applications, eventually a single one, the SPI itself could indicate what kind of traffic is transmitted (e.g., the kind of application typically running). This could also be correlated with encrypted data size to further leak information to an observer on the network. In addition, use of specific hardcoded SPI numbers could reveal a manufacturer or device version. If updated devices use different SPI numbers, an attacker could locate vulnerable devices by their use of specific SPI numbers.¶
A privacy analysis should consider at least the type of information as well as the traffic pattern before deciding whether non-random SPIs are safe to use. Typically, temperature and wind sensors that are used outdoors do not leak privacy-sensitive information, and most of their traffic is expected to be outbound traffic. When used indoors, a sensor that reports an encrypted status of a door (closed or opened) every minute might not leak sensitive information outside the local network. In these examples, the privacy aspect of the information itself might be limited. Being able to determine the version of the sensor to potentially take control of it may also have some limited security consequences. Of course, this depends on the context in which these sensors are being used. If the risks associated to privacy and security are acceptable, a non-randomized SPI can be used.¶
The Sequence Number (SN) in [RFC4303] is a mandatory 32-bit field in the packet.¶
The SN is set by the sender so the receiver can implement anti-replay protection. The SN is derived from any strictly increasing function that guarantees the following: if packet B is sent after packet A, then the SN of packet B is higher than the SN of packet A.¶
Some constrained devices may establish communication with specific devices where it is known whether or not the peer implements anti-replay protection. As per [RFC4303], the sender MUST still implement a strictly increasing function to generate the SN.¶
It is RECOMMENDED that multipurpose ESP implementations increment a counter for each packet sent. However, a constrained device may avoid maintaining this context and use another source that is known to always increase. Typically, constrained devices use 802.15.4 Time Slotted Channel Hopping (TSCH). This communication is heavily dependent on time. A constrained device can take advantage of this clock mechanism to generate the SN. A lot of IoT devices are in a sleep state most of the time and wake up only to perform a specific operation before going back to sleep. These devices have separate hardware that allows them to wake up after a certain timeout and typically also have timers that start running when the device is booted up, so they might have a concept of time with certain granularity. This requires devices to store any information in stable storage that can be restored across sleeps (e.g., flash memory). Storing information associated with the SA (such as the SN) requires some read and write operations on stable storage after each packet is sent as opposed to an SPI number or cryptographic keys that are only written to stable storage at the creation of the SA. Write operations wear out the flash storage. Write operations also slow down the system significantly, as writing to flash is much slower than reading from flash. While these devices have internal clocks or timers that might not be very accurate, they are good enough to guarantee that each time the device wakes up from sleep, the time is greater than what it was before the device went to sleep. Using time for the SN would guarantee a strictly increasing function and avoid storing any additional values or context related to the SN on flash. In addition to the time value, a RAM-based counter can be used to ensure that the serial numbers are still increasing and unique if the device sends multiple packets over an SA within one wakeup period.¶
For inbound traffic, it is RECOMMENDED that receivers implement anti-replay protection. The size of the window should depend on the network characteristic to deliver packets out of order. In an environment where out-of-order packets are not possible, the window size can be set to one. An ESP implementation may choose to not implement anti-replay protection. An implementation of anti-replay protection may require the device to write the received SN for every packet to stable storage. This will have the same issues as discussed earlier with the SN. Some constrained device implementations may choose to not implement the optional anti-replay protection. A typical example is an IoT device such as a temperature sensor that sends a temperature measurement every 60 seconds and receives an acknowledgment from the receiver. In a case like this, the ability to spoof and replay an acknowledgement is of limited interest and might not justify the implementation of an anti-replay mechanism. Receiving peers may also use an ESP anti-replay mechanism adapted to a specific application. Typically, when the sending peer is using an SN based on time, anti-replay may be implemented by discarding any packets that present an SN whose value is too much in the past. Such mechanisms may consider clock drifting in various ways in addition to acceptable delay induced by the network to avoid the anti-replay windows rejecting legitimate packets. Receiving peers could accept any SN as long as it is higher than the previously received SN. Another mechanism could be used where only the received time on the device is used to consider a packet to be valid, without looking at the SN at all.¶
The SN can be represented as a 32-bit number or as a 64-bit number, known as an "Extended Sequence Number (ESN)". As per [RFC4303], support of ESN is not mandatory, and its use is negotiated via IKEv2 [RFC7296]. An ESN is used for high-speed links to ensure there can be more than 232 packets before the SA needs to be rekeyed to prevent the SN from rolling over. This assumes the SN is incremented by 1 for each packet. When the SN is incremented differently -- such as when time is used -- rekeying needs to happen based on how the SN is incremented to prevent the SN from rolling over. The security of all data protected under a given key decreases slightly with each message, and a node must ensure the limit is not reached, even though the SN would permit it. Estimation of the maximum number of packets to be sent by a node is not always predictable, and large margins should be used, especially as nodes could be online for much more time than expected. Even for constrained devices, it is RECOMMENDED to implement some rekeying mechanisms (see Section 10).¶
Padding is required to keep the 32-bit alignment of ESP. It is also required for some encryption transforms that need a specific block size of input, such as ENCR_AES_CBC. ESP specifies padding in the Pad Length byte, followed by up to 255 bytes of padding.¶
Checking the padding structure is not mandatory, so constrained devices may omit these checks on received ESP packets. For outgoing ESP packets, padding must be applied as required by ESP.¶
In some situations, the padding bytes may take a fixed value. This would typically be the case when the Payload Data is of fixed size.¶
ESP [RFC4303] additionally provides Traffic Flow Confidentiality (TFC) as a way to perform padding to hide traffic characteristics. TFC is not mandatory and is negotiated with the SA management protocol, such as IKEv2. TFC has been widely implemented, but it is not widely deployed for ESP traffic. It is NOT RECOMMENDED to implement TFC for minimal ESP.¶
As a consequence, communication protection that relies on TFC would be more sensitive to traffic patterns without TFC. This can leak application information as well as the manufacturer or model of the device used to a passive monitoring attacker. Such information can be used, for example, by an attacker if a vulnerability is known for the specific device or application. In addition, some applications (such as health applications) could leak important privacy-oriented information.¶
Constrained devices that have a limited battery lifetime may prefer to avoid sending extra padding bytes. In most cases, the payload carried by these devices is quite small, and the standard padding mechanism can be used as an alternative to TFC. Alternatively, any information leak based on the size -- or presence -- of the packet can also be addressed at the application level before the packet is encrypted with ESP. If application packets vary between 1 to 30 bytes, the application could always send 32-byte responses to ensure all traffic sent is of identical length. To prevent leaking information that a sensor changed state, such as "temperature changed" or "door opened", an application could send this information at regular time intervals, rather than when a specific event is happening, even if the sensor state did not change.¶
ESP [RFC4303] defines the Next Header as a mandatory 8-bit field in the packet. The Next Header, only visible after decryption, specifies the data contained in the payload. In addition, the Next Header may carry an indication on how to process the packet [BEET-ESP]. The Next Header can point to a "dummy" packet, i.e., a packet with the Next Header value set to 59, meaning "no next header". The data following "no next header" is unstructured "dummy" data. (Note that this document uses the term "dummy" for consistency with [RFC4303].)¶
The ability to generate, receive, and ignore "dummy" packets is required by [RFC4303]. An implementation can omit ever generating and sending "dummy" packets. For interoperability, a minimal ESP implementation MUST be able to process and discard "dummy" packets without indicating an error.¶
In constrained environments, sending "dummy" packets may have too much impact on the device lifetime, in which case, "dummy" packets should not be generated and sent. On the other hand, constrained devices running specific applications that would leak too much information by not generating and sending "dummy" packets may implement this functionality or even implement something similar at the application layer. Note also that similarly to padding and TFC that can be used to hide some traffic characteristics (see Section 5), "dummy" packets may also reveal some patterns that can be used to identify the application. For example, an application may send "dummy" data to hide a traffic pattern. Suppose such an application sends a 1-byte data when a change occurs. This results in sending a packet notifying a change has occurred. "Dummy" packets may be used to prevent such information from being leaked by sending a 1-byte packet every second when the information is not changed. After an upgrade, the data becomes 2 bytes. At that point, the "dummy" packets do not hide anything, and having 1 byte regularly versus 2 bytes makes even the identification of the application version easier to identify. This generally makes the use of "dummy" packets more appropriate on high-speed links.¶
In some cases, devices are dedicated to a single application or a single transport protocol. In this case, the Next Header has a fixed value.¶
Specific processing indications have not been standardized yet [BEET-ESP] and are expected to result from an agreement between the peers. As a result, they SHOULD NOT be part of a minimal implementation of ESP.¶
The ICV depends on the cryptographic suite used. As detailed in [RFC8221], authentication or authenticated encryption is RECOMMENDED, and as such, the ICV field must be present with a size different from zero. Its length is defined by the security recommendations only.¶
The recommended algorithms to use are expected to evolve over time, and implementers SHOULD follow the recommendations provided by [RFC8221] and updates.¶
This section lists some of the criteria that may be considered to select an appropriate cryptographic suite. The list is not expected to be exhaustive and may also evolve over time.¶
Power Consumption and Bandwidth Consumption: Reducing the payload sent may significantly reduce the energy consumption of the device. Encryption algorithm transforms with low overhead are strongly preferred. To reduce the overall payload size, one may, for example:¶
This document has no IANA actions.¶
The security considerations in [RFC4303] apply to this document as well. In addition, this document provides security recommendations and guidance for the implementation choices for each ESP field.¶
The security of a communication provided by ESP is closely related to the security associated with the management of that key. This usually includes mechanisms to prevent a nonce from repeating, for example. When a node is provisioned with a session key that is used across reboot, the implementer MUST ensure that the mechanisms put in place remain valid across reboot as well.¶
It is RECOMMENDED to use ESP in conjunction with key management protocols such as, for example, IKEv2 [RFC7296] or minimal IKEv2 [RFC7815]. Such mechanisms are responsible for negotiating fresh session keys as well as preventing a session key being used beyond its lifetime. When such mechanisms cannot be implemented, such as when the session key is provisioned, the device MUST ensure that keys are not used beyond their lifetime and that the key remains used in compliance with all security requirements across reboots (e.g., conditions on counters and nonces remain valid).¶
When a device generates its own key or when random values such as nonces are generated, the random generation MUST follow [RFC4086]. In addition, [SP-800-90A-Rev-1] provides guidance on how to build random generators based on deterministic random functions.¶
Preventing the leakage of privacy-sensitive information is a hard problem to solve and usually results in balancing the information potentially being leaked to the cost associated with the counter measures. This problem is not inherent to the minimal ESP described in this document and also concerns the use of ESP in general.¶
This document targets minimal implementations of ESP and, as such, describes a minimalistic way to implement ESP. In some cases, this may result in potentially revealing privacy-sensitive pieces of information. This document describes these privacy implications so the implementer can make the appropriate decisions given the specificities of a given environment and deployment.¶
The main risk associated with privacy is the ability to identify an application or a device by analyzing the traffic, which is designated as "traffic shaping". As discussed in Section 3, the use in a very specific context of non-randomly generated SPIs might ease the determination of the device or the application in some cases. Similarly, padding provides limited capabilities to obfuscate the traffic compared to those provided by TFC. Such consequences on privacy are detailed in Section 5.¶
The authors would like to thank Daniel Palomares, Scott Fluhrer, Tero Kivinen, Valery Smyslov, Yoav Nir, Michael Richardson, Thomas Peyrin, Eric Thormarker, Nancy Cam-Winget, and Bob Briscoe for their valuable comments. In particular, Scott Fluhrer suggested including the rekey index in the SPI. Tero Kivinen also provided multiple clarifications and examples of ESP deployment within constrained devices with their associated optimizations. Thomas Peyrin, Eric Thormarker, and Scott Fluhrer suggested and clarified the use of transform resilient to nonce misuse. The authors would also like to thank Mohit Sethi for his support as the LWIG Working Group Chair.¶