rfc9937.original   rfc9937.txt 
TCP Maintenance Working Group M. Mathis Internet Engineering Task Force (IETF) M. Mathis
Internet-Draft Request for Comments: 9937
Obsoletes: 6937 (if approved) N. Cardwell Obsoletes: 6937 N. Cardwell
Intended status: Standards Track Y. Cheng Category: Standards Track Y. Cheng
Expires: 24 December 2025 N. Dukkipati ISSN: 2070-1721 N. Dukkipati
Google, Inc. Google, Inc.
22 June 2025 November 2025
Proportional Rate Reduction Proportional Rate Reduction
draft-ietf-tcpm-prr-rfc6937bis-21
Abstract Abstract
This document specifies a standards-track version of the Proportional This document specifies a Standards Track version of the Proportional
Rate Reduction (PRR) algorithm that obsoletes the experimental Rate Reduction (PRR) algorithm that obsoletes the Experimental
version described in RFC6937. PRR regulates the amount of data sent version described in RFC 6937. PRR regulates the amount of data sent
by TCP or other transport protocols during fast recovery. PRR by TCP or other transport protocols during fast recovery. PRR
accurately regulates the actual flight size through recovery such accurately regulates the actual flight size through recovery such
that at the end of recovery it will be as close as possible to the that at the end of recovery it will be as close as possible to the
slow start threshold (ssthresh), as determined by the congestion slow start threshold (ssthresh), as determined by the congestion
control algorithm. control algorithm.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on 24 December 2025. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9937.
Copyright Notice Copyright Notice
Copyright (c) 2025 IETF Trust and the persons identified as the Copyright (c) 2025 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
Please review these documents carefully, as they describe your rights carefully, as they describe your rights and restrictions with respect
and restrictions with respect to this document. Code Components to this document. Code Components extracted from this document must
extracted from this document must include Revised BSD License text as include Revised BSD License text as described in Section 4.e of the
described in Section 4.e of the Trust Legal Provisions and are Trust Legal Provisions and are provided without warranty as described
provided without warranty as described in the Revised BSD License. in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Conventions
3. Document and WG Information . . . . . . . . . . . . . . . . . 5 3. Definitions
4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 10 4. Changes Relative to RFC 6937
5. Changes Relative to RFC 6937 . . . . . . . . . . . . . . . . 12 5. Relationships to Other Standards
6. Relationships to other standards . . . . . . . . . . . . . . 14 6. Algorithm
7. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6.1. Initialization Steps
7.1. Initialization Steps . . . . . . . . . . . . . . . . . . 15 6.2. Per-ACK Steps
7.2. Per-ACK Steps . . . . . . . . . . . . . . . . . . . . . . 16 6.3. Per-Transmit Steps
7.3. Per-Transmit Steps . . . . . . . . . . . . . . . . . . . 17 6.4. Completion Steps
7.4. Completion Steps . . . . . . . . . . . . . . . . . . . . 18 7. Properties
8. Properties . . . . . . . . . . . . . . . . . . . . . . . . . 18 8. Examples
9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 9. Adapting PRR to Other Transport Protocols
10. Adapting PRR to other transport protocols . . . . . . . . . . 23 10. Measurement Studies
11. Measurement Studies . . . . . . . . . . . . . . . . . . . . . 23 11. Operational Considerations
12. Operational Considerations . . . . . . . . . . . . . . . . . 23 11.1. Incremental Deployment
12.1. Incremental Deployment . . . . . . . . . . . . . . . . . 23 11.2. Fairness
12.2. Fairness . . . . . . . . . . . . . . . . . . . . . . . . 23 11.3. Protecting the Network Against Excessive Queuing and
12.3. Protecting the Network Against Excessive Queuing and Packet Loss
Packet Loss . . . . . . . . . . . . . . . . . . . . . . 24 12. IANA Considerations
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 13. Security Considerations
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 14. References
15. Security Considerations . . . . . . . . . . . . . . . . . . . 25 14.1. Normative References
16. Normative References . . . . . . . . . . . . . . . . . . . . 25 14.2. Informative References
17. Informative References . . . . . . . . . . . . . . . . . . . 26 Appendix A. Strong Packet Conservation Bound
Appendix A. Strong Packet Conservation Bound . . . . . . . . . . 28 Acknowledgments
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses
1. Introduction 1. Introduction
Van Jacobson's packet conservation principle [Jacobson88] defines a Van Jacobson's packet conservation principle [Jacobson88] defines a
self clock process wherein N data segments delivered to the receiver self clock process wherein N data segments delivered to the receiver
generate acknowledgments that the data sender uses as the clock to generate acknowledgments that the data sender uses as the clock to
trigger sending another N data segments into the network. trigger sending another N data segments into the network.
Congestion control algorithms like Reno [RFC5681] and CUBIC [RFC9438] Congestion control algorithms like Reno [RFC5681] and CUBIC [RFC9438]
are built on the conceptual foundation of this self clock process. are built on the conceptual foundation of this self clock process.
They control the sending process of a transport protocol connection They control the sending process of a transport protocol connection
by using a congestion window ("cwnd") to limit "inflight", the volume by using a congestion window ("cwnd") to limit "inflight", the volume
of data that a connection estimates is in-flight in the network at a of data that a connection estimates is in flight in the network at a
given time. Furthermore, these algorithms require that transport given time. Furthermore, these algorithms require that transport
protocol connections reduce their cwnd in response to packet losses. protocol connections reduce their cwnd in response to packet losses.
Fast recovery (see [RFC5681] and [RFC6675]) is the algorithm for Fast recovery (see [RFC5681] and [RFC6675]) is the algorithm for
making this cwnd reduction using feedback from acknowledgements. Its making this cwnd reduction using feedback from acknowledgments. Its
stated goal is to maintain a sender's self clock by relying on stated goal is to maintain a sender's self clock by relying on
returning ACKs during recovery to clock more data into the network. returning ACKs during recovery to clock more data into the network.
Without Proportional Rate Reduction (PRR), fast recovery typically Without Proportional Rate Reduction (PRR), fast recovery typically
adjusts the window by waiting for a large fraction of a round-trip adjusts the window by waiting for a large fraction of a round-trip
time (one half round-trip time of ACKs for Reno [RFC5681], or 30% of time (RTT) (one half round-trip time of ACKs for Reno [RFC5681] or
a round-trip time for CUBIC [RFC9438]) to pass before sending any 30% of a round-trip time for CUBIC [RFC9438]) to pass before sending
data. any data.
[RFC6675] makes fast recovery with Selective Acknowledgement (SACK) [RFC6675] makes fast recovery with Selective Acknowledgment (SACK)
[RFC2018] more accurate by computing "pipe", a sender-side estimate [RFC2018] more accurate by computing "pipe", a sender-side estimate
of the number of bytes still outstanding in the network. With of the number of bytes still outstanding in the network. With
[RFC6675], fast recovery is implemented by sending data as necessary [RFC6675], fast recovery is implemented by sending data as necessary
on each ACK to allow pipe to rise to match ssthresh, the target on each ACK to allow pipe to rise to match ssthresh, the target
window size for fast recovery, as determined by the congestion window size for fast recovery, as determined by the congestion
control algorithm. This protects fast recovery from timeouts in many control algorithm. This protects fast recovery from timeouts in many
cases where there are heavy losses. However, [RFC6675] has two cases where there are heavy losses. However, [RFC6675] has two
significant drawbacks. First, because it makes a large significant drawbacks. First, because it makes a large
multiplicative decrease in cwnd at the start of fast recovery, it can multiplicative decrease in cwnd at the start of fast recovery, it can
cause a timeout if the entire second half of the window of data or cause a timeout if the entire second half of the window of data or
ACKs are lost. Second, a single ACK carrying a SACK option that ACKs are lost. Second, a single ACK carrying a SACK option that
implies a large quantity of missing data can cause a step implies a large quantity of missing data can cause a step
discontinuity in the pipe estimator, which can cause Fast Retransmit discontinuity in the pipe estimator, which can cause Fast Retransmit
to send a large burst of data. to send a large burst of data.
PRR regulates the transmission process during fast recovery in a PRR regulates the transmission process during fast recovery in a
manner that avoids these excess window adjustments, such that manner that avoids these excess window adjustments, such that
transmissions progress smoothly, and at the end of recovery the transmissions progress smoothly, and at the end of recovery, the
actual window size will be as close as possible to ssthresh. actual window size will be as close as possible to ssthresh.
PRR's approach is inspired by Van Jacobson's packet conservation PRR's approach is inspired by Van Jacobson's packet conservation
principle. As much as possible, PRR relies on the self clock principle. As much as possible, PRR relies on the self clock process
process, and is only slightly affected by the accuracy of estimators and is only slightly affected by the accuracy of estimators, such as
such as the estimate of the volume of in-flight data. This is what the estimate of the volume of in-flight data. This is what gives the
gives the algorithm its precision in the presence of events that algorithm its precision in the presence of events that cause
cause uncertainty in other estimators. uncertainty in other estimators.
When inflight is above ssthresh, PRR reduces inflight smoothly toward When inflight is above ssthresh, PRR reduces inflight smoothly toward
ssthresh by clocking out transmissions at a rate that is in ssthresh by clocking out transmissions at a rate that is in
proportion to both the delivered data and ssthresh. proportion to both the delivered data and ssthresh.
When inflight is less than ssthresh, PRR adaptively chooses between When inflight is less than ssthresh, PRR adaptively chooses between
one of two Reduction Bounds to limit the total window reduction due one of two Reduction Bounds to limit the total window reduction due
to all mechanisms, including transient application stalls and the to all mechanisms, including transient application stalls and the
losses themselves. As a baseline, to be cautious when there may be losses themselves. As a baseline, to be cautious when there may be
considerable congestion, PRR uses its Conservative Reduction Bound considerable congestion, PRR uses its Conservative Reduction Bound
(PRR-CRB), which is strictly packet conserving. When recovery seems (PRR-CRB), which is strictly packet conserving. When recovery seems
to be progressing well, PRR uses its Slow Start Reduction Bound (PRR- to be progressing well, PRR uses its Slow Start Reduction Bound (PRR-
SSRB), which is more aggressive than PRR-CRB by at most one segment SSRB), which is more aggressive than PRR-CRB by at most one segment
per ACK. PRR-CRB meets the Strong Packet Conservation Bound per ACK. PRR-CRB meets the Strong Packet Conservation Bound
described in Appendix A; however, when used in real networks as the described in Appendix A; however, when used in real networks as the
sole approach, it does not perform as well as the algorithm described sole approach, it does not perform as well as the algorithm described
in [RFC6675], which prove to be more aggressive in a significant in [RFC6675], which proves to be more aggressive in a significant
number of cases. PRR-SSRB offers a compromise by allowing a number of cases. PRR-SSRB offers a compromise by allowing a
connection to send one additional segment per ACK, relative to PRR- connection to send one additional segment per ACK, relative to PRR-
CRB, in some situations. Although PRR-SSRB is less aggressive than CRB, in some situations. Although PRR-SSRB is less aggressive than
[RFC6675] (transmitting fewer segments or taking more time to [RFC6675] (transmitting fewer segments or taking more time to
transmit them), it outperforms due to the lower probability of transmit them), it outperforms due to the lower probability of
additional losses during recovery. additional losses during recovery.
The original definition of the packet conservation principle The original definition of the packet conservation principle
[Jacobson88] treated packets that are presumed to be lost (e.g., [Jacobson88] treated packets that are presumed to be lost (e.g.,
marked as candidates for retransmission) as having left the network. marked as candidates for retransmission) as having left the network.
This idea is reflected in the inflight estimator used by PRR, but it This idea is reflected in the inflight estimator used by PRR, but it
is distinct from the Strong Packet Conservation Bound as described in is distinct from the Strong Packet Conservation Bound as described in
Appendix A, which is defined solely on the basis of data arriving at Appendix A, which is defined solely on the basis of data arriving at
the receiver. the receiver.
This document specifies several main changes from the earlier version This document specifies several main changes from the earlier version
of PRR in [RFC6937]. First, it introduces a new adaptive heuristic of PRR in [RFC6937]. First, it introduces a new adaptive heuristic
that replaces a manual configuration parameter that determined how that replaces a manual configuration parameter that determined how
conservative PRR was when inflight was less than ssthresh (whether to conservative PRR was when inflight was less than ssthresh (whether to
use PRR-CRB or PRR-SSRB). Second, the algorithm specifies behavior use PRR-CRB or PRR-SSRB). Second, the algorithm specifies behavior
for non-SACK connections (connections that have not negotiated for non-SACK connections (connections that have not negotiated SACK
[RFC2018] SACK support via the "SACK-permitted" option). Third, the [RFC2018] support via the "SACK-permitted" option). Third, the
algorithm ensures a smooth sending process even when the sender has algorithm ensures a smooth sending process even when the sender has
experienced high reordering and starts loss recovery after a large experienced high reordering and starts loss recovery after a large
amount of sequence space has been SACKed. Finally, this document amount of sequence space has been SACKed. Finally, this document
also includes additional discussion about the integration of PRR with also includes additional discussion about the integration of PRR with
congestion control and loss detection algorithms. congestion control and loss detection algorithms.
PRR has extensive deployment experience in multiple TCP PRR has extensive deployment experience in multiple TCP
implementations since the first widely deployed TCP PRR implementations since the first widely deployed TCP PRR
implementation in 2011 [First_TCP_PRR]. implementation in 2011 [First_TCP_PRR].
2. Conventions 2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
3. Document and WG Information 3. Definitions
_RFC Editor: please advise on how we can specify the "Janey C. Hoe"
name in the "Acknowledgements" section as XML that would be correctly
translated by xm2rfc into plain ASCII txt output with a single space
after the "C." (interpreting the "C." as an initial) rather than a
double space after the "C." (interpreting the "C." as the end of the
sentence)._
_RFC Editor: please remove this section before publication_
Formatted: 2025-06-22 19:27:52-07:00
Please send all comments, questions and feedback to tcpm@ietf.org
About revision 00:
The introduction above was drawn from draft-mathis-tcpm-rfc6937bis-
00. All of the text below was copied verbatim from RFC 6937, to
facilitate comparison between RFC 6937 and this document as it
evolves.
About revision 01:
* Recast the RFC 6937 introduction as background
* Made "Changes From RFC 6937" an explicit section
* Made Relationships to other standards more explicit
* Added a generalized SafeACK heuristic
* Provided hints for non TCP implementations
* Added language about detecting ACK splitting, but have no advice
on actions (yet)
About revision 02:
* Companion RACK loss detection RECOMMENDED
* Non-SACK accounting in the pseudo code
* cwnd computation in the pseudo code
* Force fast retransmit at the beginning of fast recovery
* Remove deprecated Rate-Halving text
* Fixed bugs in the example traces
About revision 03 and 04:
* Clarify when and how SndCnt becomes 0
* Improve algorithm to smooth the sending rate under higher
reordering cases
About revision 05:
* Revert the RecoverFS text and pseudocode to match the behavior in
draft revision 03 and more closely match Linux TCP PRR
About revision 06:
* Update RecoverFS to be initialized as: RecoverFS = pipe.
About revision 07:
* Restored the revision 04 prose description for the rationale for
initializing RecoverFS as: RecoverFS = pipe.
* Added reference to [Hoe96Startup] in acknowledgements
About revision 08:
* Inserted missing reference to [RFC9293]
* Recategorized "voluntary window reductions" as a phrase introduced
by PRR
About revision 09:
* Document the setting of cwnd = ssthresh when the sender completes
a PRR episode, based on Linux TCP PRR experience and the mailing
list discussion in the TCPM mailing list thread: "draft-ietf-tcpm-
prr-rfc6937bis-03: set cwnd to ssthresh exiting fast recovery?".
Mention the potential for bursts as a result of setting cwnd =
ssthresh. Say that pacing is RECOMMENDED to deal with this.
* Revised RecoverFS initialization to handle fast recoveries with
mixes of real and spurious loss detection events (due to
reordering), and incorporate consideration for a potentially large
volume of data that is SACKed before fast recovery starts.
* Fixed bugs in the definition of DeliveredData (reverted to
definition from RFC 6937).
* Clarified PRR triggers initialization based on start of congestion
control reduction, not loss recovery, since congestion control may
reduce ssthresh for each round trip with new losses in recovery.
* Fixed bugs in PRR examples.
About revision 10:
* Minor typo fixes and wordsmithing.
About revision 11:
* Based on comments at the TCPM session at IETF 120, clarified the
scope of congestion control algorithms for which PRR can be used,
and clarified that it can be used for Reno or CUBIC.
About revision 12:
* Added "About revision 11" and "About revision 12" sections.
* Added a clarification about the applicability to CUBIC in the
algorithm section.
About revision 13:
* Switch from using the RFC 6675 "pipe" concept to an "inflight"
concept that is independent of loss detection algorithm, and thus
is usable with RACK-TLP loss detection [RFC8985]
About revision 14:
* Numerous editorial changes based on 2025-04-15 review from WIT
area director Gorry Fairhurst.
* Added a note to the RFC Editor to remove this "Document and WG
Information" section before publication.
* Rephrased all sentences with "we" or "our" to remove those words.
* Updated the RFC2119 MUST/SHOULD/MAY/... text to use the latest
boilerplate text from RFC8174, and moved this text into a separate
section.
* Ensured that each term in the "Definitions" section is listed with
(a) the term, (b) an actual in-line definition, and (c) the
citation of the original source reference, where appropriate.
* Added missing definitions for terms used in the document: cwnd,
rwnd, ssthresh, SND.NXT, RMSS
* In the "Relationships to other standards", after the paragraph
about the congestion control algorithms with which PRR can be
used, added a paragraph about PRR's independence from loss
detection algorithm details and an explicit list of loss detection
algorithms with which PRR can be used.
* Where appropriate, changed "TCP" to a more generic phrase, like:
"transport protocol", "connection", or "sender", depending on the
context. Left "TCP" in place where that was the precise term that
was appropriate in the context, given the protocol or packet
header details. There are now no references to "TCP" in between
the definition of SMSS and the "Adapting PRR to other transport
protocols" section. The "Algorithm", "Examples", and "Properties"
sections no longer mention "TCP".
* Corrected the two occurrences of "MSS" in the pseudocode to use
"SMSS", since "SMSS" has a definition and is consistent with the
Reno (RFC5681) and CUBIC (RFC9438) documents.
* Clarified the recommendation to use pacing to avoid bursts, and
moved this into its own paragraph to make it easier for the reader
to see.
About revision 15:
* Fixed the description of the initialization of RecoverFS to match
the latest RecoverFS pseudocode
* Add a note that in the first example both algorithms (RFC6675 and
PRR) complete the fast recovery episode with a cwnd matching the
ssthresh of 20.
* Revised order of 2nd and 4th co-author
* Numerous editorial changes based on 2025-05-27 last call Genart
review from Russ Housley, including the following changes.
* Fixed abstract and intro sections that said that this document
"updates" the experimental PRR algorithm to clarify that this
document obsoletes the experimental PRR RFC
* To address the feedback 'The 7th paragraph of Section 5 begins
with "A final change"; yet the 8th paragraph talks about another
adaptation to PRR', reworded the "A final change" phrase.
* Moved the paragraph about measurement studies to a new
"Measurement Studies" section, to address the feedback: 'The last
paragraph of Section 5 is not really about changes since the
publication of RFC 6937'
* Fixed various minor editorial issues identified in the review
About revision 16:
* Revised the description and caption for the figures to try to
improve clarity.
About revision 17:
* Moved the explanation of "Van Jacobson's packet conservation
principle" to be before the first use of the concept in the phrase
"strictly packet conserving".
* Numerous editorial changes based on the 29 suggestions in the
2025-06-03 perfmetrdir review from Paul Aitken ("perfmetrdir
review of draft-ietf-tcpm-prr-rfc6937bis-16"), including the
following larger-scale changes.
* Ensured that all references to RFCs (mainly RFC6675 and RFC6937)
used proper xref tags.
* Moved the "Definitions" section to be immediately before the
"Background" section, so that more terms are defined before being
used.
About revision 18:
* Several editorial changes based on the 2025-06-04 Opsdir review
from Daniele Ceccarelli ("draft-ietf-tcpm-prr-rfc6937bis-16 ietf
last call Opsdir review"), including the following larger-scale
changes.
* Moved the content in the "Background" section into the
"Introduction" section and revised the content to ensure that each
passage only uses terms and concepts already described by the
earlier text.
* Made things simpler and more consistent by replacing a few
"Reduction Bound algorithms" with "Reduction Bounds". In revision
16 we already had the simpler "Reduction Bounds" phrasing in four
spots, so this makes the text more self-consistent.
About revision 19:
* Fix a nit in the abstract caught by "idnits" online tool: 'The
abstract seems to contain references ([RFC6937]), which it
shouldn't. Please replace those with straight textual mentions of
the documents in question.'
* Several editorial changes based on the suggestions in the
2025-06-12 perfmetrdir review from Paul Aitken (tcpm thread:
"perfmetrdir review of draft-ietf-tcpm-prr-rfc6937bis-16").
About revision 20:
* Several editorial changes based on the suggestions in the
2025-06-13 review from Mohamed Boucadair (tcpm thread: "Mohamed
Boucadair's Yes on draft-ietf-tcpm-prr-rfc6937bis-19: (with
COMMENT)"), including the following larger changes.
* Changed the "Proportional Rate Reduction for TCP" title to
"Proportional Rate Reduction"
* Added an "Operational Considerations" section.
* Moved the prose description of the computation of DeliveredData,
inflight, RecoverFS, etc, from the "Definitions" section to the
"Algorithm" section.
* Moved the example section so that it is immediately after the
discussion about properties, rather than immediately before.
About revision 21:
* Fix a typo from revision 20 where an extra/old sentence about
multiple implementations was accidentally left in the document.
4. Definitions
The following terms, parameters, and state variables are used as they The following terms, parameters, and state variables are used as they
are defined in earlier documents: are defined in earlier documents:
SND.UNA: The oldest unacknowledged sequence number. This is defined SND.UNA: The oldest unacknowledged sequence number. This is defined
in Section 3.4 of [RFC9293]. in Section 3.4 of [RFC9293].
SND.NXT: The next sequence number to be sent. This is defined in SND.NXT: The next sequence number to be sent. This is defined in
Section 3.4 of [RFC9293]. Section 3.4 of [RFC9293].
duplicate ACK: An acknowledgment is considered a "duplicate ACK" or duplicate ACK: An acknowledgment is considered a "duplicate ACK" or
"duplicate acknowledgment" when (a) the receiver of the ACK has "duplicate acknowledgment" when (a) the receiver of the ACK has
outstanding data, (b) the incoming acknowledgment carries no data, outstanding data, (b) the incoming acknowledgment carries no data,
(c) the SYN and FIN bits are both off, (d) the acknowledgment number (c) the SYN and FIN bits are both off, (d) the acknowledgment
is equal to SND.UNA, and (e) the advertised window in the incoming number is equal to SND.UNA, and (e) the advertised window in the
acknowledgment equals the advertised window in the last incoming incoming acknowledgment equals the advertised window in the last
acknowledgment. This is defined in Section 2 of [RFC5681]. incoming acknowledgment. This is defined in Section 2 of
[RFC5681].
FlightSize: The amount of data that has been sent but not yet FlightSize: The amount of data that has been sent but not yet
cumulatively acknowledged. This is defined in Section 2 [RFC5681]. cumulatively acknowledged. This is defined in Section 2 of
[RFC5681].
Receiver Maximum Segment Size (RMSS): The RMSS is the size of the Receiver Maximum Segment Size (RMSS): The RMSS is the size of the
largest segment the receiver is willing to accept. This is the value largest segment the receiver is willing to accept. This is the
specified in the MSS option sent by the receiver during connection value specified in the MSS option sent by the receiver during
startup (see Section 3.7.1 of [RFC9293]). Or, if the MSS option is connection startup (see Section 3.7.1 of [RFC9293]). Or if the
not used, it is the default of 536 bytes for IPv4 or 1220 bytes for MSS option is not used, it is the default of 536 bytes for IPv4 or
IPv6 (see Section 3.7.1 of [RFC9293]). The size does not include the 1220 bytes for IPv6 (see Section 3.7.1 of [RFC9293]). The size
TCP/IP headers and options. The RMSS is defined in Section 2 of does not include the TCP/IP headers and options. The RMSS is
[RFC5681] and section 3.8.6.3 of [RFC9293]. defined in Section 2 of [RFC5681] and Section 3.8.6.3 of
[RFC9293].
Sender Maximum Segment Size (SMSS): The SMSS is the size of the Sender Maximum Segment Size (SMSS): The SMSS is the size of the
largest segment that the sender can transmit. This value can be largest segment that the sender can transmit. This value can be
based on the maximum transmission unit of the network, the path MTU based on the Maximum Transmission Unit (MTU) of the network, the
discovery [RFC1191] [RFC8201] [RFC4821] algorithm, RMSS, or other path MTU discovery [RFC1191] [RFC8201] [RFC4821] algorithm, RMSS,
factors. The size does not include the TCP/IP headers and options. or other factors. The size does not include the TCP/IP headers
This is defined in Section 2 of [RFC5681]. and options. This is defined in Section 2 of [RFC5681].
Receiver Window (rwnd): The most recently received advertised Receiver Window (rwnd): The most recently received advertised
receiver window, in bytes. At any given time, a connection MUST NOT receiver window, in bytes. At any given time, a connection MUST
send data with a sequence number higher than the sum of SND.UNA and NOT send data with a sequence number higher than the sum of
rwnd. This is defined in section 2 [RFC5681]. SND.UNA and rwnd. This is defined in Section 2 of [RFC5681].
Congestion Window (cwnd): A state variable that limits the amount of Congestion Window (cwnd): A state variable that limits the amount of
data a connection can send. At any given time, a connection MUST NOT data a connection can send. At any given time, a connection MUST
send data if inflight (see below) matches or exceeds cwnd. This is NOT send data if inflight (see below) matches or exceeds cwnd.
defined in Section 2 of [RFC5681]. This is defined in Section 2 of [RFC5681].
Slow Start Threshold (ssthresh): The slow start threshold (ssthresh) Slow Start Threshold (ssthresh): The slow start threshold (ssthresh)
state variable is used to determine whether the slow start or state variable is used to determine whether the slow start or
congestion avoidance algorithm is used to control data transmission. congestion avoidance algorithm is used to control data
During fast recovery, ssthresh is the target window size for a fast transmission. During fast recovery, ssthresh is the target window
recovery episode, as determined by the congestion control algorithm. size for a fast recovery episode, as determined by the congestion
This is defined in Section 3.1 of [RFC5681]. control algorithm. This is defined in Section 3.1 of [RFC5681].
PRR defines additional variables and terms: PRR defines additional variables and terms:
Delivered Data (DeliveredData): The data sender's best estimate of Delivered Data (DeliveredData): The data sender's best estimate of
the total number of bytes that the current ACK indicates have been the total number of bytes that the current ACK indicates have been
delivered to the receiver since the previously received ACK. delivered to the receiver since the previously received ACK.
In-Flight Data (inflight): The data sender's best estimate of the In-Flight Data (inflight): The data sender's best estimate of the
number of unacknowledged bytes in flight in the network; i.e., bytes number of unacknowledged bytes in flight in the network, i.e.,
that were sent and neither lost nor received by the data receiver. bytes that were sent and neither lost nor received by the data
receiver.
Recovery Flight Size (RecoverFS): The number of bytes the sender Recovery Flight Size (RecoverFS): The number of bytes the sender
estimates might possibly be delivered over the course of the current estimates might possibly be delivered over the course of the
PRR episode. current PRR episode.
SafeACK: A local boolean variable indicating that the current ACK SafeACK: A local boolean variable indicating that the current ACK
indicates the recovery is making good progress and the sender can indicates the recovery is making good progress and the sender can
send more aggressively, increasing inflight, if appropriate. send more aggressively, increasing inflight, if appropriate.
SndCnt: A local variable indicating exactly how many bytes should be SndCnt: A local variable indicating exactly how many bytes should be
sent in response to each ACK. sent in response to each ACK.
Voluntary window reductions: choosing not to send data in response to Voluntary window reductions: Choosing not to send data in response
some ACKs, for the purpose of reducing the sending window size and to some ACKs, for the purpose of reducing the sending window size
data rate. and data rate.
5. Changes Relative to RFC 6937 4. Changes Relative to RFC 6937
The largest change since [RFC6937] is the introduction of a new The largest change since [RFC6937] is the introduction of a new
heuristic that uses good recovery progress (for TCP, when the latest heuristic that uses good recovery progress (for TCP, when the latest
ACK advances SND.UNA and does not indicate that a prior fast ACK advances SND.UNA and does not indicate that a prior fast
retransmit has been lost) to select the Reduction Bound (PRR-CRB or retransmit has been lost) to select the Reduction Bound (PRR-CRB or
PRR-SSRB). [RFC6937] left the choice of Reduction Bound to the PRR-SSRB). [RFC6937] left the choice of Reduction Bound to the
discretion of the implementer but recommended to use PRR-SSRB by discretion of the implementer but recommended to use PRR-SSRB by
default. For all of the environments explored in earlier PRR default. For all of the environments explored in earlier PRR
research, the new heuristic is consistent with the old research, the new heuristic is consistent with the old
recommendation. recommendation.
The paper "An Internet-Wide Analysis of Traffic Policing" The paper "An Internet-Wide Analysis of Traffic Policing"
[Flach2016policing] uncovered a crucial situation not previously [Flach2016policing] uncovered a crucial situation not previously
explored, where both Reduction Bounds perform very poorly, but for explored, where both Reduction Bounds perform very poorly but for
different reasons. Under many configurations, token bucket traffic different reasons. Under many configurations, token bucket traffic
policers can suddenly start discarding a large fraction of the policers can suddenly start discarding a large fraction of the
traffic when tokens are depleted, without any warning to the end traffic when tokens are depleted, without any warning to the end
systems. The transport congestion control has no opportunity to systems. The transport congestion control has no opportunity to
measure the token rate, and sets ssthresh based on the previously measure the token rate and sets ssthresh based on the previously
observed path performance. This value for ssthresh may cause a data observed path performance. This value for ssthresh may cause a data
rate that is substantially larger than the token replenishment rate, rate that is substantially larger than the token replenishment rate,
causing high loss. Under these conditions, both Reduction Bounds causing high loss. Under these conditions, both Reduction Bounds
perform very poorly. PRR-CRB is too timid, sometimes causing very perform very poorly. PRR-CRB is too timid, sometimes causing very
long recovery times at smaller than necessary windows, and PRR-SSRB long recovery times at smaller than necessary windows, and PRR-SSRB
is too aggressive, often causing many retransmissions to be lost for is too aggressive, often causing many retransmissions to be lost for
multiple rounds. Both cases lead to prolonged recovery, decimating multiple rounds. Both cases lead to prolonged recovery, decimating
application latency and/or goodput. application latency and/or goodput.
Investigating these environments led to the development of a Investigating these environments led to the development of a
"SafeACK" heuristic to dynamically switch between Reduction Bounds: "SafeACK" heuristic to dynamically switch between Reduction Bounds:
by default conservatively use PRR-CRB and only switch to PRR-SSRB by default, conservatively use PRR-CRB and only switch to PRR-SSRB
when ACKs indicate the recovery is making good progress (SND.UNA is when ACKs indicate the recovery is making good progress (SND.UNA is
advancing without detecting any new losses). The SafeACK heuristic advancing without detecting any new losses). The SafeACK heuristic
was experimented with in Google's CDN [Flach2016policing] and was experimented with in Google's Content Delivery Network (CDN)
implemented in Linux TCP since 2015. [Flach2016policing] and implemented in Linux TCP since 2015.
This SafeACK heuristic is only invoked where losses, application- This SafeACK heuristic is only invoked where losses, application-
limited behavior, or other events cause the current estimate of in- limited behavior, or other events cause the current estimate of in-
flight data to fall below ssthresh. The high loss rates that make flight data to fall below ssthresh. The high loss rates that make
the heuristic essential are only common in the presence of heavy the heuristic essential are only common in the presence of heavy
losses such as traffic policers [Flach2016policing]. In these losses, such as traffic policers [Flach2016policing]. In these
environments the heuristic performs better than either bound by environments, the heuristic performs better than either bound by
itself. itself.
Another PRR algorithm change improves the sending process when the Another PRR algorithm change improves the sending process when the
sender enters recovery after a large portion of sequence space has sender enters recovery after a large portion of sequence space has
been SACKed. This scenario could happen when the sender has been SACKed. This scenario could happen when the sender has
previously detected reordering, for example, by using [RFC8985]. In previously detected reordering, for example, by using [RFC8985]. In
the previous version of PRR, RecoverFS did not properly account for the previous version of PRR, RecoverFS did not properly account for
sequence ranges SACKed before entering fast recovery, which caused sequence ranges SACKed before entering fast recovery, which caused
PRR to initially send too slowly. With the change, PRR properly PRR to initially send too slowly. With the change, PRR properly
accounts for sequence ranges SACKed before entering fast recovery. accounts for sequence ranges SACKed before entering fast recovery.
skipping to change at page 13, line 41 skipping to change at line 339
that triggers the recovery. Previously, PRR may not allow a fast that triggers the recovery. Previously, PRR may not allow a fast
retransmit (i.e., SndCnt is 0) on the first ACK in fast recovery, retransmit (i.e., SndCnt is 0) on the first ACK in fast recovery,
depending on the loss situation. Forcing a fast retransmit is depending on the loss situation. Forcing a fast retransmit is
important to maintain the ACK clock and avoid potential important to maintain the ACK clock and avoid potential
retransmission timeout (RTO) events. The forced fast retransmit only retransmission timeout (RTO) events. The forced fast retransmit only
happens once during the entire recovery and still follows the packet happens once during the entire recovery and still follows the packet
conservation principles in PRR. This heuristic has been implemented conservation principles in PRR. This heuristic has been implemented
since the first widely deployed TCP PRR implementation in 2011 since the first widely deployed TCP PRR implementation in 2011
[First_TCP_PRR]. [First_TCP_PRR].
In another change, upon exiting recovery a data sender sets cwnd to In another change, upon exiting recovery, a data sender sets cwnd to
ssthresh. This is important for robust performance. Without setting ssthresh. This is important for robust performance. Without setting
cwnd to ssthresh at the end of recovery, with application-limited cwnd to ssthresh at the end of recovery and with application-limited
sender behavior and some loss patterns cwnd could end fast recovery sender behavior and some loss patterns, cwnd could end fast recovery
well below ssthresh, leading to bad performance. The performance well below ssthresh, leading to bad performance. The performance
could, in some cases, be worse than [RFC6675] recovery, which simply could, in some cases, be worse than [RFC6675] recovery, which simply
sets cwnd to ssthresh at the start of recovery. This behavior of sets cwnd to ssthresh at the start of recovery. This behavior of
setting cwnd to ssthresh at the end of recovery has been implemented setting cwnd to ssthresh at the end of recovery has been implemented
since the first widely deployed TCP PRR implementation in 2011 since the first widely deployed TCP PRR implementation in 2011
[First_TCP_PRR], and is similar to [RFC6675], which specifies setting [First_TCP_PRR] and is similar to [RFC6675], which specifies setting
cwnd to ssthresh at the start of recovery. cwnd to ssthresh at the start of recovery.
Since [RFC6937] was written, PRR has also been adapted to perform Since [RFC6937] was written, PRR has also been adapted to perform
multiplicative window reduction for non-loss based congestion control multiplicative window reduction for non-loss-based congestion control
algorithms, such as for [RFC3168] style Explicit Congestion algorithms, such as for [RFC3168] style Explicit Congestion
Notification (ECN). This can be done by using some parts of the loss Notification (ECN). This can be done by using some parts of the loss
recovery state machine (in particular the RecoveryPoint from recovery state machine (in particular, the RecoveryPoint from
[RFC6675]) to invoke the PRR ACK processing for exactly one round [RFC6675]) to invoke the PRR ACK processing for exactly one round
trip worth of ACKs. However, note that using PRR for cwnd reductions trip worth of ACKs. However, note that using PRR for cwnd reductions
for [RFC3168] ECN has been observed, with some approaches to Active for ECN [RFC3168] has been observed, with some approaches to Active
Queue Management (AQM), to cause an excess cwnd reduction during ECN- Queue Management (AQM), to cause an excess cwnd reduction during ECN-
triggered congestion episodes, as noted in [VCC]. triggered congestion episodes, as noted in [VCC].
6. Relationships to other standards 5. Relationships to Other Standards
PRR MAY be used in conjunction with any congestion control algorithm PRR MAY be used in conjunction with any congestion control algorithm
that intends to make a multiplicative decrease in its sending rate that intends to make a multiplicative decrease in its sending rate
over approximately the time scale of one round trip time, as long as over approximately the time scale of one round-trip time, as long as
the current volume of in-flight data is limited by a congestion the current volume of in-flight data is limited by a congestion
window (cwnd) and the target volume of in-flight data during that window (cwnd) and the target volume of in-flight data during that
reduction is a fixed value given by ssthresh. In particular, PRR is reduction is a fixed value given by ssthresh. In particular, PRR is
applicable to both Reno [RFC5681] and CUBIC [RFC9438] congestion applicable to both Reno [RFC5681] and CUBIC [RFC9438] congestion
control. PRR is described as a modification to "A Conservative Loss control. PRR is described as a modification to "A Conservative Loss
Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP" Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP"
[RFC6675]. It is most accurate with SACK [RFC2018] but does not [RFC6675]. It is most accurate with SACK [RFC2018] but does not
require SACK. require SACK.
PRR can be used in conjunction with a wide array of loss detection PRR can be used in conjunction with a wide array of loss detection
algorithms. This is because PRR does not have any dependencies on algorithms. This is because PRR does not have any dependencies on
the details of how a loss detection algorithm estimates which packets the details of how a loss detection algorithm estimates which packets
have been delivered and which packets have been lost. Upon the have been delivered and which packets have been lost. Upon the
reception of each ACK, PRR simply needs the loss detection algorithm reception of each ACK, PRR simply needs the loss detection algorithm
to communicate how many packets have been marked as lost and how many to communicate how many packets have been marked as lost and how many
packets have been marked as delivered. Thus PRR MAY be used in packets have been marked as delivered. Thus, PRR MAY be used in
conjunction with the loss detection algorithms specified or described conjunction with the loss detection algorithms specified or described
in the following documents: Reno [RFC5681], NewReno [RFC6582], SACK in the following documents: Reno [RFC5681], NewReno [RFC6582], SACK
[RFC6675], FACK [FACK], and RACK-TLP [RFC8985]. Because of the [RFC6675], Forward Acknowledgment (FACK) [FACK], and Recent
Acknowledgment Tail Loss Probe (RACK-TLP) [RFC8985]. Because of the
performance properties of RACK-TLP, including resilience to tail performance properties of RACK-TLP, including resilience to tail
loss, reordering, and lost retransmissions, it is RECOMMENDED that loss, reordering, and lost retransmissions, it is RECOMMENDED that
PRR is implemented together with RACK-TLP loss recovery [RFC8985]. PRR is implemented together with RACK-TLP loss recovery [RFC8985].
The SafeACK heuristic came about as a result of robust Lost The SafeACK heuristic came about as a result of robust Lost
Retransmission Detection under development in an early precursor to Retransmission Detection under development in an early precursor to
[RFC8985]. Without Lost Retransmission Detection, policers that [RFC8985]. Without Lost Retransmission Detection, policers that
cause very high loss rates are at very high risk of causing cause very high loss rates are at very high risk of causing
retransmission timeouts because Reno [RFC5681], CUBIC [RFC9438], and retransmission timeouts because Reno [RFC5681], CUBIC [RFC9438], and
[RFC6675] can send retransmissions significantly above the policed [RFC6675] can send retransmissions significantly above the policed
rate. rate.
7. Algorithm 6. Algorithm
7.1. Initialization Steps 6.1. Initialization Steps
At the beginning of a congestion control response episode initiated At the beginning of a congestion control response episode initiated
by the congestion control algorithm, a data sender using PRR MUST by the congestion control algorithm, a data sender using PRR MUST
initialize the PRR state. initialize the PRR state.
The timing of the start of a congestion control response episode is The timing of the start of a congestion control response episode is
entirely up to the congestion control algorithm, and (for example) entirely up to the congestion control algorithm, and (for example)
could correspond to the start of a fast recovery episode, or a once- could correspond to the start of a fast recovery episode, or a once-
per-round-trip reduction when lost retransmits or lost original per-round-trip reduction when lost retransmits or lost original
transmissions are detected after fast recovery is already in transmissions are detected after fast recovery is already in
skipping to change at page 16, line 5 skipping to change at line 447
prr_out = 0 // Total bytes sent in recovery prr_out = 0 // Total bytes sent in recovery
RecoverFS = SND.NXT - SND.UNA RecoverFS = SND.NXT - SND.UNA
// Bytes SACKed before entering recovery will not be // Bytes SACKed before entering recovery will not be
// marked as delivered during recovery: // marked as delivered during recovery:
RecoverFS -= (bytes SACKed in scoreboard) RecoverFS -= (bytes SACKed in scoreboard)
// Include the (common) case of selectively ACKed bytes: // Include the (common) case of selectively ACKed bytes:
RecoverFS += (bytes newly SACKed) RecoverFS += (bytes newly SACKed)
// Include the (rare) case of cumulatively ACKed bytes: // Include the (rare) case of cumulatively ACKed bytes:
RecoverFS += (bytes newly cumulatively acknowledged) RecoverFS += (bytes newly cumulatively acknowledged)
7.2. Per-ACK Steps 6.2. Per-ACK Steps
On every ACK starting or during fast recovery, excluding the ACK that On every ACK starting or during fast recovery, excluding the ACK that
concludes a PRR episode, PRR executes the following steps. concludes a PRR episode, PRR executes the following steps.
First, the sender computes DeliveredData, the data sender's best First, the sender computes DeliveredData, the data sender's best
estimate of the total number of bytes that the current ACK indicates estimate of the total number of bytes that the current ACK indicates
have been delivered to the receiver since the previously received have been delivered to the receiver since the previously received
ACK. With SACK, DeliveredData can be computed precisely as the ACK. With SACK, DeliveredData can be computed precisely as the
change in SND.UNA, plus the (signed) change in SACKed. Thus, in the change in SND.UNA, plus the (signed) change in SACK. Thus, in the
special case when there are no SACKed sequence ranges in the special case when there are no SACKed sequence ranges in the
scoreboard before or after the ACK, DeliveredData is the change in scoreboard before or after the ACK, DeliveredData is the change in
SND.UNA. In recovery without SACK, DeliveredData is estimated to be SND.UNA. In recovery without SACK, DeliveredData is estimated to be
1 SMSS on receiving a duplicate ACK, and on a subsequent partial or 1 SMSS on receiving a duplicate ACK, and on a subsequent partial or
full ACK DeliveredData is the change in SND.UNA, minus 1 SMSS for full ACK DeliveredData is the change in SND.UNA, minus 1 SMSS for
each preceding duplicate ACK. Note that without SACK, a poorly- each preceding duplicate ACK. Note that without SACK, a poorly
behaved receiver that returns extraneous duplicate ACKs (as described behaved receiver that returns extraneous duplicate ACKs (as described
in [Savage99]) could attempt to artificially inflate DeliveredData. in [Savage99]) could attempt to artificially inflate DeliveredData.
As a mitigation, if not using SACK then PRR disallows incrementing As a mitigation, if not using SACK, then PRR disallows incrementing
DeliveredData when the total bytes delivered in a PRR episode would DeliveredData when the total bytes delivered in a PRR episode would
exceed the estimated data outstanding upon entering recovery exceed the estimated data outstanding upon entering recovery
(RecoverFS). (RecoverFS).
Next, the sender computes inflight, the data sender's best estimate Next, the sender computes inflight, the data sender's best estimate
of the number of bytes that are in flight in the network. To of the number of bytes that are in flight in the network. To
calculate inflight, connections with SACK enabled and using [RFC6675] calculate inflight, connections with SACK enabled and using loss
loss detection MAY use the "pipe" algorithm as specified in detection [RFC6675] MAY use the "pipe" algorithm as specified in
[RFC6675]. SACK-enabled connections using RACK-TLP loss detection [RFC6675]. SACK-enabled connections using RACK-TLP loss detection
[RFC8985] or other loss detection algorithms MUST calculate inflight [RFC8985] or other loss detection algorithms MUST calculate inflight
by starting with SND.NXT - SND.UNA, subtracting out bytes SACKed in by starting with SND.NXT - SND.UNA, subtracting out bytes SACKed in
the scoreboard, subtracting out bytes marked lost in the scoreboard, the scoreboard, subtracting out bytes marked lost in the scoreboard,
and adding bytes in the scoreboard that have been retransmitted since and adding bytes in the scoreboard that have been retransmitted since
they were last marked lost. For non-SACK-enabled connections, they were last marked lost. For non-SACK-enabled connections,
instead of subtracting out bytes SACKed in the SACK scoreboard, instead of subtracting out bytes SACKed in the SACK scoreboard,
senders MUST subtract out: min(RecoverFS, 1 SMSS for each preceding senders MUST subtract out: min(RecoverFS, 1 SMSS for each preceding
duplicate ACK in the fast recovery episode); the min() with RecoverFS duplicate ACK in the fast recovery episode); the min() with RecoverFS
is to protect against misbehaving receivers [Savage99]. is to protect against misbehaving receivers [Savage99].
Next, the sender computes SafeACK, a local boolean variable Next, the sender computes SafeACK, a local boolean variable
indicating that the current ACK reported good progress. SafeACK is indicating that the current ACK reported good progress. SafeACK is
true only when the ACK has cumulatively acknowledged new data and the true only when the ACK has cumulatively acknowledged new data and the
ACK does not indicate further losses. For example, an ACK triggering ACK does not indicate further losses. For example, an ACK triggering
[RFC6675] "rescue" retransmission (Section 4, NextSeg() condition 4) "rescue" retransmission (Section 4 of [RFC6675], NextSeg() condition
may indicate further losses. Both conditions indicate the recovery 4) may indicate further losses. Both conditions indicate the
is making good progress and the sender can send more aggressively, recovery is making good progress and the sender can send more
increasing inflight, if appropriate. aggressively, increasing inflight, if appropriate.
Finally, the sender uses DeliveredData, inflight, SafeACK, and other Finally, the sender uses DeliveredData, inflight, SafeACK, and other
PRR state to compute SndCnt, a local variable indicating exactly how PRR states to compute SndCnt, a local variable indicating exactly how
many bytes should be sent in response to each ACK, and then uses many bytes should be sent in response to each ACK and then uses
SndCnt to update cwnd. SndCnt to update cwnd.
The full sequence of per-ACK PRR algorithm steps is as follows: The full sequence of per-ACK PRR algorithm steps is as follows:
if (DeliveredData is 0) if (DeliveredData is 0)
Return Return
prr_delivered += DeliveredData prr_delivered += DeliveredData
inflight = (estimated volume of in-flight data) inflight = (estimated volume of in-flight data)
SafeACK = (SND.UNA advances and no further loss indicated) SafeACK = (SND.UNA advances and no further loss indicated)
skipping to change at page 17, line 46 skipping to change at line 535
// Force a fast retransmit upon entering recovery // Force a fast retransmit upon entering recovery
SndCnt = SMSS SndCnt = SMSS
} }
cwnd = inflight + SndCnt cwnd = inflight + SndCnt
After the sender computes SndCnt and uses it to update cwnd, the After the sender computes SndCnt and uses it to update cwnd, the
sender transmits more data. Note that the decision of which data to sender transmits more data. Note that the decision of which data to
send (e.g., retransmit missing data or send more new data) is out of send (e.g., retransmit missing data or send more new data) is out of
scope for this document. scope for this document.
7.3. Per-Transmit Steps 6.3. Per-Transmit Steps
On any data transmission or retransmission, PRR executes the On any data transmission or retransmission, PRR executes the
following: following:
prr_out += (data sent) prr_out += (data sent)
7.4. Completion Steps 6.4. Completion Steps
A PRR episode ends upon either completing fast recovery, or before A PRR episode ends upon either completing fast recovery or before
initiating a new PRR episode due to a new congestion control response initiating a new PRR episode due to a new congestion control response
episode. episode.
On the completion of a PRR episode, PRR executes the following: On the completion of a PRR episode, PRR executes the following:
cwnd = ssthresh cwnd = ssthresh
Note that this step that sets cwnd to ssthresh can potentially, in Note that this step that sets cwnd to ssthresh can potentially, in
some scenarios, allow a burst of back-to-back segments into the some scenarios, allow a burst of back-to-back segments into the
network. network.
It is RECOMMENDED that implementations use pacing to reduce the It is RECOMMENDED that implementations use pacing to reduce the
burstiness of data traffic. This recommendation is consistent with burstiness of data traffic. This recommendation is consistent with
current practice to mitigate bursts (e.g., [I-D.welzl-iccrg-pacing]), current practice to mitigate bursts (e.g., [PACING]), including
including pacing transmission bursts after restarting from idle. pacing transmission bursts after restarting from idle.
8. Properties 7. Properties
The following properties are common to both PRR-CRB and PRR-SSRB, The following properties are common to both PRR-CRB and PRR-SSRB,
except as noted: except as noted:
PRR attempts to maintain the sender's ACK clocking across recovery PRR attempts to maintain the sender's ACK clocking across recovery
events, including burst losses. By contrast, [RFC6675] can send events, including burst losses. By contrast, [RFC6675] can send
large, unclocked bursts following burst losses. large, unclocked bursts following burst losses.
Normally, PRR will spread voluntary window reductions out evenly Normally, PRR will spread voluntary window reductions out evenly
across a full RTT. This has the potential to generally reduce the across a full RTT. This has the potential to generally reduce the
burstiness of Internet traffic, and could be considered to be a type burstiness of Internet traffic and could be considered to be a type
of soft pacing. Hypothetically, any pacing increases the probability of soft pacing. Hypothetically, any pacing increases the probability
that different flows are interleaved, reducing the opportunity for that different flows are interleaved, reducing the opportunity for
ACK compression and other phenomena that increase traffic burstiness. ACK compression and other phenomena that increase traffic burstiness.
However, these effects have not been quantified. However, these effects have not been quantified.
If there are minimal losses, PRR will converge to exactly the target If there are minimal losses, PRR will converge to exactly the target
window chosen by the congestion control algorithm. Note that as the window chosen by the congestion control algorithm. Note that as the
sender approaches the end of recovery, prr_delivered will approach sender approaches the end of recovery, prr_delivered will approach
RecoverFS and SndCnt will be computed such that prr_out approaches RecoverFS and SndCnt will be computed such that prr_out approaches
ssthresh. ssthresh.
skipping to change at page 20, line 17 skipping to change at line 641
amount of data delivered to the receiver. This Strong Packet amount of data delivered to the receiver. This Strong Packet
Conservation Bound is the most aggressive algorithm that does not Conservation Bound is the most aggressive algorithm that does not
lead to additional forced losses in some environments. It has the lead to additional forced losses in some environments. It has the
property that if there is a standing queue at a bottleneck with no property that if there is a standing queue at a bottleneck with no
cross traffic, the queue will maintain exactly constant length for cross traffic, the queue will maintain exactly constant length for
the duration of the recovery, except for +1/-1 fluctuation due to the duration of the recovery, except for +1/-1 fluctuation due to
differences in packet arrival and exit times. See Appendix A for a differences in packet arrival and exit times. See Appendix A for a
detailed discussion of this property. detailed discussion of this property.
Although the Strong Packet Conservation Bound is very appealing for a Although the Strong Packet Conservation Bound is very appealing for a
number of reasons, earlier measurements (in section 6 of [RFC6675]) number of reasons, earlier measurements (in Section 6 of [RFC6675])
demonstrate that it is less aggressive and does not perform as well demonstrate that it is less aggressive and does not perform as well
as [RFC6675], which permits bursts of data when there are bursts of as [RFC6675], which permits bursts of data when there are bursts of
losses. PRR-SSRB is a compromise that permits a sender to send one losses. PRR-SSRB is a compromise that permits a sender to send one
extra segment per ACK as compared to the Packet Conserving Bound when extra segment per ACK as compared to the Packet Conserving Bound when
the ACK indicates the recovery is in good progress without further the ACK indicates the recovery is in good progress without further
losses. From the perspective of a strict Packet Conserving Bound, losses. From the perspective of a strict Packet Conserving Bound,
PRR-SSRB does indeed open the window during recovery; however, it is PRR-SSRB does indeed open the window during recovery; however, it is
significantly less aggressive than [RFC6675] in the presence of burst significantly less aggressive than [RFC6675] in the presence of burst
losses. The [RFC6675] "half window of silence" may temporarily losses. The [RFC6675] "half window of silence" may temporarily
reduce queue pressure when congestion control does not reduce the reduce queue pressure when congestion control does not reduce the
congestion window entering recovery to avoid further losses. The congestion window entering recovery to avoid further losses. The
goal of PRR is to minimize the opportunities to lose the self clock goal of PRR is to minimize the opportunities to lose the self clock
by smoothly controlling inflight toward the target set by the by smoothly controlling inflight toward the target set by the
congestion control. It is the congestion control's responsibility to congestion control. It is the congestion control's responsibility to
avoid a full queue, not PRR. avoid a full queue, not PRR.
9. Examples 8. Examples
This section illustrates the PRR and [RFC6675] algorithms by showing This section illustrates the PRR and [RFC6675] algorithm by showing
their different behaviors for two example scenarios: a connection their different behaviors for two example scenarios: a connection
experiencing either a single loss or a burst of 15 consecutive experiencing either a single loss or a burst of 15 consecutive
losses. All cases use bulk data transfers (no application pauses), losses. All cases use bulk data transfers (no application pauses),
Reno congestion control [RFC5681], and cwnd = FlightSize = inflight = Reno congestion control [RFC5681], and cwnd = FlightSize = inflight =
20 segments, so ssthresh will be set to 10 at the beginning of 20 segments, so ssthresh will be set to 10 at the beginning of
recovery. The scenarios use standard Fast Retransmit [RFC5681] and recovery. The scenarios use standard Fast Retransmit [RFC5681] and
Limited Transmit [RFC3042], so the sender will send two new segments Limited Transmit [RFC3042], so the sender will send two new segments
followed by one retransmit in response to the first three duplicate followed by one retransmit in response to the first three duplicate
ACKs following the losses. ACKs following the losses.
Each of the diagrams below shows the per ACK response to the first Each of the diagrams below shows the per ACK response to the first
round trip for the two recovery algorithms when the zeroth segment is round trip for the two recovery algorithms when the zeroth segment is
lost. The top line ("ack#") indicates the transmitted segment number lost. The top line ("ack#") indicates the transmitted segment number
triggering the ACKs, with an X for the lost segment. The "cwnd" and triggering the ACKs, with an X for the lost segment. The "cwnd" and
"inflight" lines indicate the values of cwnd and inflight, "inflight" lines indicate the values of cwnd and inflight,
respectively, for these algorithms after processing each returning respectively, for these algorithms after processing each returning
ACK but before further (re)transmission. The "sent" line indicates ACK but before further (re)transmission. The "sent" line indicates
how much 'N'ew or 'R'etransmitted data would be sent. Note that the how much "N"ew or "R"etransmitted data would be sent. Note that the
algorithms for deciding which data to send are out of scope of this algorithms for deciding which data to send are out of scope of this
document. document.
RFC 6675 RFC 6675
a X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 a X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
c 20 20 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 c 20 20 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
i 19 19 18 18 17 16 15 14 13 12 11 10 9 9 9 9 9 9 9 9 9 9 i 19 19 18 18 17 16 15 14 13 12 11 10 9 9 9 9 9 9 9 9 9 9
s N N R N N N N N N N N N N s N N R N N N N N N N N N N
PRR PRR
skipping to change at page 21, line 30 skipping to change at line 703
a: ack#; c: cwnd; i: inflight; s: sent a: ack#; c: cwnd; i: inflight; s: sent
Figure 1 Figure 1
In this first example, ACK#1 through ACK#19 contain SACKs for the In this first example, ACK#1 through ACK#19 contain SACKs for the
original flight of data, ACK#20 and ACK#21 carry SACKs for the original flight of data, ACK#20 and ACK#21 carry SACKs for the
limited transmits triggered by the first and second SACKed segments, limited transmits triggered by the first and second SACKed segments,
and ACK#22 carries the full cumulative ACK covering all data up and ACK#22 carries the full cumulative ACK covering all data up
through the limited transmits. ACK#22 completes the fast recovery through the limited transmits. ACK#22 completes the fast recovery
episode, and thus completes the PRR episode. episode and thus completes the PRR episode.
Note that both algorithms send the same total amount of data, and Note that both algorithms send the same total amount of data, and
both algorithms complete the fast recovery episode with a cwnd both algorithms complete the fast recovery episode with a cwnd
matching the ssthresh of 20. [RFC6675] experiences a "half window of matching the ssthresh of 20. [RFC6675] experiences a "half window of
silence" while PRR spreads the voluntary window reduction across an silence" while PRR spreads the voluntary window reduction across an
entire RTT. entire RTT.
Next, consider an example scenario with the same initial conditions, Next, consider an example scenario with the same initial conditions,
except that the first 15 packets (0-14) are lost. During the except that the first 15 packets (0-14) are lost. During the
remainder of the lossy round trip, only 5 ACKs are returned to the remainder of the lossy round trip, only 5 ACKs are returned to the
skipping to change at page 22, line 24 skipping to change at line 736
i 19 19 4 4 4 i 19 19 4 4 4
s N N R R R s N N R R R
a: ack#; c: cwnd; i: inflight; s: sent a: ack#; c: cwnd; i: inflight; s: sent
Figure 2 Figure 2
In this specific situation, [RFC6675] is more aggressive because once In this specific situation, [RFC6675] is more aggressive because once
Fast Retransmit is triggered (on the ACK for segment 17), the sender Fast Retransmit is triggered (on the ACK for segment 17), the sender
immediately retransmits sufficient data to bring inflight up to cwnd. immediately retransmits sufficient data to bring inflight up to cwnd.
Earlier measurements (in section 6 of [RFC6675]) indicate that Earlier measurements (in Section 6 of [RFC6675]) indicate that
[RFC6675] significantly outperforms [RFC6937] PRR using only PRR-CRB, [RFC6675] significantly outperforms PRR [RFC6937] using only PRR-CRB
and some other similarly conservative algorithms that were tested, and some other similarly conservative algorithms that were tested,
showing that it is significantly common for the actual losses to showing that it is significantly common for the actual losses to
exceed the cwnd reduction determined by the congestion control exceed the cwnd reduction determined by the congestion control
algorithm. algorithm.
Under such heavy losses, during the first round trip of fast recovery Under such heavy losses, during the first round trip of fast
PRR uses the PRR-CRB to follow the packet conservation principle. recovery, PRR uses the PRR-CRB to follow the packet conservation
Since the total losses bring inflight below ssthresh, data is sent principle. Since the total losses bring inflight below ssthresh,
such that the total data transmitted, prr_out, follows the total data data is sent such that the total data transmitted, prr_out, follows
delivered to the receiver as reported by returning ACKs. the total data delivered to the receiver as reported by returning
Transmission is controlled by the sending limit, which is set to ACKs. Transmission is controlled by the sending limit, which is set
prr_delivered - prr_out. to prr_delivered - prr_out.
While not shown in the figure above, once the fast retransmits sent While not shown in the figure above, once the fast retransmits sent
starting at ACK#17 are delivered and elicit ACKs that increment the starting at ACK#17 are delivered and elicit ACKs that increment the
SND.UNA, PRR enters PRR-SSRB and increases the window by exactly 1 SND.UNA, PRR enters PRR-SSRB and increases the window by exactly 1
segment per ACK until inflight rises to ssthresh during recovery. On segment per ACK until inflight rises to ssthresh during recovery. On
heavy losses when cwnd is large, PRR-SSRB recovers the losses heavy losses when cwnd is large, PRR-SSRB recovers the losses
exponentially faster than PRR-CRB. Although increasing the window exponentially faster than PRR-CRB. Although increasing the window
during recovery seems to be ill advised, it is important to remember during recovery seems to be ill advised, it is important to remember
that this is actually less aggressive than permitted by [RFC6675], that this is actually less aggressive than permitted by [RFC6675],
which sends the same quantity of additional data as a single burst in which sends the same quantity of additional data as a single burst in
response to the ACK that triggered Fast Retransmit. response to the ACK that triggered Fast Retransmit.
For less severe loss events, where the total losses are smaller than For less severe loss events, where the total losses are smaller than
the difference between FlightSize and ssthresh, PRR-CRB and PRR-SSRB the difference between FlightSize and ssthresh, PRR-CRB and PRR-SSRB
are not invoked since PRR stays in the proportional rate reduction are not invoked since PRR stays in the Proportional Rate Reduction
mode. mode.
10. Adapting PRR to other transport protocols 9. Adapting PRR to Other Transport Protocols
The main PRR algorithm and reductions bounds can be adapted to any The main PRR algorithm and reductions bounds can be adapted to any
transport that can support [RFC6675]. In one major implementation transport that can support [RFC6675]. In one major implementation
(Linux TCP) PRR has been the fast recovery algorithm for its default (Linux TCP), PRR has been the fast recovery algorithm for its default
and supported congestion control modules since its introduction in and supported congestion control modules since its introduction in
2011 [First_TCP_PRR]. 2011 [First_TCP_PRR].
The SafeACK heuristic can be generalized as any ACK of a The SafeACK heuristic can be generalized as any ACK of a
retransmission that does not cause some other segment to be marked retransmission that does not cause some other segment to be marked
for retransmission. for retransmission.
11. Measurement Studies 10. Measurement Studies
For [RFC6937] a companion paper [IMC11] evaluated [RFC3517] and For [RFC6937], a companion paper [IMC11] evaluated [RFC3517] and
various experimental PRR versions in a large-scale measurement study. various experimental PRR versions in a large-scale measurement study.
At the time of publication, the legacy algorithms used in that study At the time of publication, the legacy algorithms used in that study
are no longer present in the code base used in that study, making are no longer present in the code base used in that study, making
such comparisons difficult without recreating historical algorithms. such comparisons difficult without recreating historical algorithms.
Readers interested in the measurement study should review section 5 Readers interested in the measurement study should review Section 5
of [RFC6937] and the IMC paper [IMC11]. of [RFC6937] and the IMC paper [IMC11].
12. Operational Considerations 11. Operational Considerations
12.1. Incremental Deployment 11.1. Incremental Deployment
PRR is incrementally deployable, because it utilizes only existing PRR is incrementally deployable, because it utilizes only existing
transport protocol mechanisms for data delivery acknowledgment and transport protocol mechanisms for data delivery acknowledgment and
the detection of lost data. PRR only requires only changes to the the detection of lost data. PRR only requires changes to the
transport protocol implementation at the data sender; it does not transport protocol implementation at the data sender; it does not
require any changes at data receivers or in networks. This allows require any changes at data receivers or in networks. This allows
data senders using PRR to work correctly with any existing data data senders using PRR to work correctly with any existing data
receivers or networks. PRR does not require any changes to or receivers or networks. PRR does not require any changes to or
assistance from routers, switches, or other devices in the network. assistance from routers, switches, or other devices in the network.
12.2. Fairness 11.2. Fairness
PRR is designed to maintain the fairness properties of the congestion PRR is designed to maintain the fairness properties of the congestion
control algorithm with which it is deployed. PRR only operates control algorithm with which it is deployed. PRR only operates
during a congestion control response episode, such as fast recovery during a congestion control response episode, such as fast recovery
or response to [RFC3168] ECN, and only makes short-term, per- or response to ECN [RFC3168], and only makes short-term, per-
acknowledgment decisions to smoothly regulate the volume of in-flight acknowledgment decisions to smoothly regulate the volume of in-flight
data during an episode such that at the end of the episode it will be data during an episode such that at the end of the episode it will be
as close as possible to the slow start threshold (ssthresh), as as close as possible to the slow start threshold (ssthresh), as
determined by the congestion control algorithm. PRR does not modify determined by the congestion control algorithm. PRR does not modify
the congestion control cwnd increase or decrease mechanisms outside the congestion control cwnd increase or decrease mechanisms outside
of congestion control response episodes. of congestion control response episodes.
12.3. Protecting the Network Against Excessive Queuing and Packet Loss 11.3. Protecting the Network Against Excessive Queuing and Packet Loss
Over long time scales, PRR is designed to maintain the queuing and Over long time scales, PRR is designed to maintain the queuing and
packet loss properties of the congestion control algorithm with which packet loss properties of the congestion control algorithm with which
it is deployed. As noted above, PRR only operates during a it is deployed. As noted above, PRR only operates during a
congestion control response episode, such as fast recovery or congestion control response episode, such as fast recovery or
response to ECN, and only makes short-term, per-acknowledgment response to ECN, and only makes short-term, per-acknowledgment
decisions to smoothly regulate the volume of in-flight data during an decisions to smoothly regulate the volume of in-flight data during an
episode such that at the end of the episode it will be as close as episode such that at the end of the episode it will be as close as
possible to the slow start threshold (ssthresh), as determined by the possible to the slow start threshold (ssthresh), as determined by the
congestion control algorithm. congestion control algorithm.
Over short time scales, PRR is designed to cause lower packet loss Over short time scales, PRR is designed to cause lower packet loss
rates than preceding approaches like [RFC6675]. At a high level, PRR rates than preceding approaches like [RFC6675]. At a high level, PRR
is inspired by the packet conservation principle, and, as much as is inspired by the packet conservation principle, and as much as
possible, PRR relies on the self clock process. By contrast, with possible, PRR relies on the self clock process. By contrast, with
[RFC6675] a single ACK carrying a SACK option that implies a large [RFC6675], a single ACK carrying a SACK option that implies a large
quantity of missing data can cause a step discontinuity in the pipe quantity of missing data can cause a step discontinuity in the pipe
estimator, which can cause Fast Retransmit to send a large burst of estimator, which can cause Fast Retransmit to send a large burst of
data that is much larger than the volume of delivered data. PRR data that is much larger than the volume of delivered data. PRR
avoids such bursts by basing transmission decisions on the volume of avoids such bursts by basing transmission decisions on the volume of
delivered data rather than the volume of lost data. Furthermore, as delivered data rather than the volume of lost data. Furthermore, as
noted above, PRR-SSRB is less aggressive than [RFC6675] (transmitting noted above, PRR-SSRB is less aggressive than [RFC6675] (transmitting
fewer segments or taking more time to transmit them), and it fewer segments or taking more time to transmit them), and it
outperforms due to the lower probability of additional losses during outperforms due to the lower probability of additional losses during
recovery. recovery.
13. Acknowledgements 12. IANA Considerations
This document is based in part on previous work by Janey C. Hoe (see
section 3.2, "Recovery from Multiple Packet Losses", of
[Hoe96Startup]) and Matt Mathis, Jeff Semke, and Jamshid Mahdavi
[RHID], and influenced by several discussions with John Heffner.
Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the
experiments. Ilpo Jarvinen reviewed the initial implementation.
Mark Allman, Richard Scheffenegger, Markku Kojo, Mirja Kuehlewind,
Gorry Fairhurst, Russ Housley, Paul Aitken, Daniele Ceccarelli, and
Mohamed Boucadair improved the document through their insightful
reviews and suggestions.
14. IANA Considerations
This memo includes no request to IANA. This document has no IANA actions.
15. Security Considerations 13. Security Considerations
PRR does not change the risk profile for transport protocols. PRR does not change the risk profile for transport protocols.
Implementers that change PRR from counting bytes to segments have to Implementers that change PRR from counting bytes to segments have to
be cautious about the effects of ACK splitting attacks [Savage99], be cautious about the effects of ACK splitting attacks [Savage99],
where the receiver acknowledges partial segments for the purpose of where the receiver acknowledges partial segments for the purpose of
confusing the sender's congestion accounting. confusing the sender's congestion accounting.
16. Normative References 14. References
14.1. Normative References
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
DOI 10.17487/RFC1191, November 1990, DOI 10.17487/RFC1191, November 1990,
<https://www.rfc-editor.org/info/rfc1191>. <https://www.rfc-editor.org/info/rfc1191>.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, Selective Acknowledgment Options", RFC 2018,
DOI 10.17487/RFC2018, October 1996, DOI 10.17487/RFC2018, October 1996,
<https://www.rfc-editor.org/info/rfc2018>. <https://www.rfc-editor.org/info/rfc2018>.
skipping to change at page 26, line 28 skipping to change at line 915
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)",
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022,
<https://www.rfc-editor.org/info/rfc9293>. <https://www.rfc-editor.org/info/rfc9293>.
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed.,
"CUBIC for Fast and Long-Distance Networks", RFC 9438, "CUBIC for Fast and Long-Distance Networks", RFC 9438,
DOI 10.17487/RFC9438, August 2023, DOI 10.17487/RFC9438, August 2023,
<https://www.rfc-editor.org/info/rfc9438>. <https://www.rfc-editor.org/info/rfc9438>.
17. Informative References 14.2. Informative References
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgment:
Refining TCP Congestion Control", ACM SIGCOMM SIGCOMM1996, Refining TCP Congestion Control", ACM SIGCOMM Computer
August 1996, Communication Review, vol. 26, no. 4, pp. 281-291,
DOI 10.1145/248157.248181, August 1996,
<https://dl.acm.org/doi/pdf/10.1145/248157.248181>. <https://dl.acm.org/doi/pdf/10.1145/248157.248181>.
[First_TCP_PRR] [First_TCP_PRR]
"Proportional Rate Reduction for TCP.", commit "Proportional Rate Reduction for TCP.", commit
a262f0cdf1f2916ea918dc329492abb5323d9a6c, August 2011, a262f0cdf1f2916ea918dc329492abb5323d9a6c, August 2011,
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/ <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/ linux.git/
commit/?id=a262f0cdf1f2916ea918dc329492abb5323d9a6c>. commit/?id=a262f0cdf1f2916ea918dc329492abb5323d9a6c>.
[Flach2016policing] [Flach2016policing]
Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng, Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng,
Y., Al Karim, T., Katz-Bassett, E., and R. Govindan, "An Y., Karim, T., Katz-Bassett, E., and R. Govindan, "An
Internet-Wide Analysis of Traffic Policing", ACM Internet-Wide Analysis of Traffic Policing", SIGCOMM '16:
SIGCOMM SIGCOMM2016, August 2016. Proceedings of the 2016 ACM SIGCOMM Conference, pp.
468-482, DOI 10.1145/2934872.2934873, August 2016,
<https://doi.org/10.1145/2934872.2934873>.
[Hoe96Startup] [Hoe96Startup]
Hoe, J., "Improving the start-up behavior of a congestion Hoe, J., "Improving the Start-up Behavior of a Congestion
control scheme for TCP", ACM SIGCOMM SIGCOMM1996, August Control Scheme for TCP", SIGCOMM '96: Conference
1996. Proceedings on Applications, Technologies, Architectures,
and Protocols for Computer Communications, pp. 270-280,
[I-D.welzl-iccrg-pacing] DOI 10.1145/248157.248180, August 1996,
Welzl, M., Eddy, W., Goel, V., and M. Txen, "Pacing in <https://doi.org/10.1145/248157.248180>.
Transport Protocols", Work in Progress, Internet-Draft,
draft-welzl-iccrg-pacing, 3 March 2025,
<https://datatracker.ietf.org/doc/html/draft-welzl-iccrg-
pacing>.
[IMC11] Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi, [IMC11] Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi,
"Proportional Rate Reduction for TCP", Proceedings of the "Proportional Rate Reduction for TCP", IMC '11:
11th ACM SIGCOMM Conference on Internet Measurement Proceedings of the 2011 ACM SIGCOMM Conference on Internet
2011, Berlin, Germany, November 2011. Measurement Conference, pp. 155-170,
DOI 10.1145/2068816.2068832, November 2011,
<https://doi.org/10.1145/2068816.2068832>.
[Jacobson88] [Jacobson88]
Jacobson, V., "Congestion Avoidance and Control", SIGCOMM Jacobson, V., "Congestion Avoidance and Control",
Comput. Commun. Rev. 18(4), August 1988. Symposium proceedings on Communications architectures and
protocols (SIGCOMM '88), pp. 314-329,
DOI 10.1145/52325.52356, August 1988,
<https://doi.org/10.1145/52325.52356>.
[PACING] Welzl, M., Eddy, W., Goel, V., and M. Tüxen, "Pacing in
Transport Protocols", Work in Progress, Internet-Draft,
draft-welzl-iccrg-pacing-03, 7 July 2025,
<https://datatracker.ietf.org/doc/html/draft-welzl-iccrg-
pacing-03>.
[RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
TCP's Loss Recovery Using Limited Transmit", RFC 3042, TCP's Loss Recovery Using Limited Transmit", RFC 3042,
DOI 10.17487/RFC3042, January 2001, DOI 10.17487/RFC3042, January 2001,
<https://www.rfc-editor.org/info/rfc3042>. <https://www.rfc-editor.org/info/rfc3042>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
skipping to change at page 27, line 41 skipping to change at line 986
[RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
Conservative Selective Acknowledgment (SACK)-based Loss Conservative Selective Acknowledgment (SACK)-based Loss
Recovery Algorithm for TCP", RFC 3517, Recovery Algorithm for TCP", RFC 3517,
DOI 10.17487/RFC3517, April 2003, DOI 10.17487/RFC3517, April 2003,
<https://www.rfc-editor.org/info/rfc3517>. <https://www.rfc-editor.org/info/rfc3517>.
[RFC6937] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional [RFC6937] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937, Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937,
May 2013, <https://www.rfc-editor.org/info/rfc6937>. May 2013, <https://www.rfc-editor.org/info/rfc6937>.
[RHID] Mathis, M., Semke, J., and J. Mahdavi, "The Rate-Halving
Algorithm for TCP Congestion Control", Work in Progress,
August 1999, <https://datatracker.ietf.org/doc/html/draft-
mathis-tcp-ratehalving>.
[Savage99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, [Savage99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
"TCP congestion control with a misbehaving receiver", "TCP Congestion Control with a Misbehaving Receiver", ACM
SIGCOMM Comput. Commun. Rev. 29(5), October 1999. SIGCOMM Computer Communication Review, vol. 29, no. 5, pp.
71-78, DOI 10.1145/505696.505704, October 1999,
<https://doi.org/10.1145/505696.505704>.
[TCP-RH] Mathis, M., Mahdavi, J., and J. Semke, "The Rate-Halving
Algorithm for TCP Congestion Control", Work in Progress,
Internet-Draft, draft-mathis-tcp-ratehalving-00, 30 August
1999, <https://datatracker.ietf.org/doc/html/draft-mathis-
tcp-ratehalving-00>.
[VCC] Cronkite-Ratcliff, B., Bergman, A., Vargaftik, S., Ravi, [VCC] Cronkite-Ratcliff, B., Bergman, A., Vargaftik, S., Ravi,
M., McKeown, N., Abraham, I., and I. Keslassy, M., McKeown, N., Abraham, I., and I. Keslassy,
"Virtualized Congestion Control (Extended Version)", "Virtualized Congestion Control (Extended Version)",
SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM
Conference pp. 230-243, DOI 10.1145/2934872.2934889,
August 2016, <http://www.ee.technion.ac.il/~isaac/p/ August 2016, <http://www.ee.technion.ac.il/~isaac/p/
sigcomm16_vcc_extended.pdf>. sigcomm16_vcc_extended.pdf>.
Appendix A. Strong Packet Conservation Bound Appendix A. Strong Packet Conservation Bound
PRR-CRB is based on a conservative, philosophically pure, and PRR-CRB is based on a conservative, philosophically pure, and
aesthetically appealing Strong Packet Conservation Bound, described aesthetically appealing Strong Packet Conservation Bound, described
here. Although inspired by the packet conservation principle here. Although inspired by the packet conservation principle
[Jacobson88], it differs in how it treats segments that are missing [Jacobson88], it differs in how it treats segments that are missing
and presumed lost. Under all conditions and sequences of events and presumed lost. Under all conditions and sequences of events
during recovery, PRR-CRB strictly bounds the data transmitted to be during recovery, PRR-CRB strictly bounds the data transmitted to be
equal to or less than the amount of data delivered to the receiver. equal to or less than the amount of data delivered to the receiver.
Note that the effects of presumed losses are included in the inflight Note that the effects of presumed losses are included in the inflight
calculation, but do not affect the outcome of PRR-CRB, once inflight calculation but do not affect the outcome of PRR-CRB once inflight
has fallen below ssthresh. has fallen below ssthresh.
This Strong Packet Conservation Bound is the most aggressive This Strong Packet Conservation Bound is the most aggressive
algorithm that does not lead to additional forced losses in some algorithm that does not lead to additional forced losses in some
environments. It has the property that if there is a standing queue environments. It has the property that if there is a standing queue
at a bottleneck that is carrying no other traffic, the queue will at a bottleneck that is carrying no other traffic, the queue will
maintain exactly constant length for the entire duration of the maintain exactly constant length for the entire duration of the
recovery, except for +1/-1 fluctuation due to differences in packet recovery, except for +1/-1 fluctuation due to differences in packet
arrival and exit times. Any less aggressive algorithm will result in arrival and exit times. Any less aggressive algorithm will result in
a declining queue at the bottleneck. Any more aggressive algorithm a declining queue at the bottleneck. Any more aggressive algorithm
skipping to change at page 28, line 49 skipping to change at line 1044
bottleneck in the forward path. In particular, when a packet is bottleneck in the forward path. In particular, when a packet is
"served" at the head of the bottleneck queue, the following events "served" at the head of the bottleneck queue, the following events
happen in much less than one bottleneck packet time: the packet happen in much less than one bottleneck packet time: the packet
arrives at the receiver; the receiver sends an ACK that arrives at arrives at the receiver; the receiver sends an ACK that arrives at
the sender; the sender processes the ACK and sends some data; the the sender; the sender processes the ACK and sends some data; the
data is queued at the bottleneck. data is queued at the bottleneck.
If SndCnt is set to DeliveredData and nothing else is inhibiting If SndCnt is set to DeliveredData and nothing else is inhibiting
sending data, then clearly the data arriving at the bottleneck queue sending data, then clearly the data arriving at the bottleneck queue
will exactly replace the data that was served at the head of the will exactly replace the data that was served at the head of the
queue, so the queue will have a constant length. If queue is drop queue, so the queue will have a constant length. If the queue is
tail and full, then the queue will stay exactly full. Losses or drop tail and full, then the queue will stay exactly full. Losses or
reordering on the ACK path only cause wider fluctuations in the queue reordering on the ACK path only cause wider fluctuations in the queue
size, but do not raise its peak size, independent of whether the data size but do not raise its peak size, independent of whether the data
is in order or out of order (including loss recovery from an earlier is in order or out of order (including loss recovery from an earlier
RTT). Any more aggressive algorithm that sends additional data will RTT). Any more aggressive algorithm that sends additional data will
overflow the drop tail queue and cause loss. Any less aggressive overflow the drop tail queue and cause loss. Any less aggressive
algorithm will under-fill the queue. Therefore, setting SndCnt to algorithm will under-fill the queue. Therefore, setting SndCnt to
DeliveredData is the most aggressive algorithm that does not cause DeliveredData is the most aggressive algorithm that does not cause
forced losses in this simple network. Relaxing the assumptions forced losses in this simple network. Relaxing the assumptions
(e.g., making delays more authentic and adding more flows, delayed (e.g., making delays more authentic and adding more flows, delayed
ACKs, etc.) is likely to increase the fine grained fluctuations in ACKs, etc.) is likely to increase the fine-grained fluctuations in
queue size but does not change its basic behavior. queue size but does not change its basic behavior.
Note that the congestion control algorithm implements a broader Note that the congestion control algorithm implements a broader
notion of optimal that includes appropriately sharing the network. notion of optimal that includes appropriately sharing the network.
Typical congestion control algorithms are likely to reduce the data Typical congestion control algorithms are likely to reduce the data
sent relative to the Packet Conserving Bound implemented by PRR, sent relative to the Packet Conserving Bound implemented by PRR,
bringing TCP's actual window down to ssthresh. bringing TCP's actual window down to ssthresh.
Acknowledgments
This document is based in part on previous work by Janey C. Hoe (see
"Recovery from Multiple Packet Losses", Section 3.2 of
[Hoe96Startup]), Matt Mathis, Jeff Semke, and Jamshid Mahdavi
[TCP-RH] and influenced by several discussions with John Heffner.
Monia Ghobadi and Sivasankar Radhakrishnan helped analyze the
experiments. Ilpo Jarvinen reviewed the initial implementation.
Mark Allman, Richard Scheffenegger, Markku Kojo, Mirja Kuehlewind,
Gorry Fairhurst, Russ Housley, Paul Aitken, Daniele Ceccarelli, and
Mohamed Boucadair improved the document through their insightful
reviews and suggestions.
Authors' Addresses Authors' Addresses
Matt Mathis Matt Mathis
Email: ietf@mattmathis.net Email: ietf@mattmathis.net
Neal Cardwell Neal Cardwell
Google, Inc. Google, Inc.
Email: ncardwell@google.com Email: ncardwell@google.com
Yuchung Cheng Yuchung Cheng
 End of changes. 104 change blocks. 
526 lines changed or deleted 270 lines changed or added

This html diff was produced by rfcdiff 1.48.