IP Performance Metrics WG (ippm) Monday, March 15 at 13:00-15:00 =============================== The meeting was chaired by the working group chairs, Will Leland and Matt Zekauskas, and was reported by Guy Almes, Bruce Siegell, and Rayadurgam Ravikanth. AGENDA: 1. Introduction 2. Comparing two implementations of the delay and loss metrics 3. Empirical Bulk Transfer metrics 4. Loss patterns 5. Delay and loss update 6. Relating ITU and IPPM Metrics: Framework, Loss, and Delay 7. Future directions IETF Home page: http://www.ietf.org/html.charters/ippm-charter.html IPPM Home page: http://www.advanced.org/IPPM/ 1. Overview and agenda bashing, Matt Zekauskas (Advanced Network & Services) Connectivity now has a number, RFC 2498. The round-trip delay draft is being reviewed by the PingER folks. The jitter draft is now without an author; it needs work and volunteers are welcome. There were no changes to the agenda. 2. Comparing two implementations of the delay and loss metrics, Henk Uijterwaal (RIPE NCC) [See slides] First, Henk reviewed the motivation for the study: we now have two implementations of the same metrics (one-way delay and loss) -- one by RIPE NCC and the other by Advanced. These are two completely separate implementations, which use different hardware. (There are general similarities; they are both Intel boxes running a version of Unix and use GPS to synchronize clocks.) We wanted to measure the same path using both instruments, and see if the results agreed. The experimental setup for each is that there are two test boxes, each with synchronized clocks. One test box sends a packet to the other. RIPE and Advanced exchanged boxes in October, 1998. The boxes at each site are placed on the same Ethernet segment, and have been collecting measurements in parallel. There are some differences, however: RIPE sends 100 byte packets on a Poisson schedule with an average rate of 3 per minute. Advanced sends 40 byte packets on a Poisson schedule with an average rate of 2 per second. Henk presented a slide of raw delays for each system from RIPE NCC (in Amsterdam) to Advanced (in Armonk, New York) from 1-March-1999. The two slides seemed to be consistent. Both plots: - show 50 millisecond delays in light periods - about 200 millisecond delays in busy periods - the rise and falls agree Next Henk showed data from a two-month period from the end of October to the end of December, 1998. The median, 2.5 percentile and 97.5 percentile over six-hour periods seem to agree. The distributions of the percentiles seem to agree. He tried a Kolmorogov-Smirnov test on the medians, and it shows a 62% likelihood that the curves originate from the same underlying distribution. Finally, Henk reported on packet loss data. Despite differences in parameters of measurement the results seem to agree. The conclusion is that the two separate implementations agree. Q&A: Did we consider using the same hardware, or swapping hardware? Because the two different implementations use different GPS hardware used, and keep time differently (RIPE uses the Dave Mills in-kernel PLL to synch to a one pulse-per-second signal from their Motorola GPS hardware; Advanced uses a TrueTime card, and reads the time directly off the card), the implementations cannot directly use each others hardware. It was pointed out that one area of disagreement in the 1-March-1999 plot was in the minima -- it looks like there is a clock problem because there appears to be a constant slope to the line. Henk pointed out that he had just pulled recent data before leaving for the meeting, but he would check. [It was later found that the RIPE box did not have GPS locked that day; they keep auxiliary data to tell when the clock is suspect, and he did not check it before printing the plot. See the IPPM mailing list archives for more information.] A general question was raised: what types of statistical tests could be applied? Will Leland pointed out that there were two general questions: do the results come from the same distribution, and do the results make sense. IPPM needs better methodology for judging if two implementations measure the same thing. This is roughly equivalent to protocol interoperability. A point about the Kolmorogov-Smirnov test was brought up -- it is designed for small data sets, not the large ones produced here, so its results should be considered carefully. In addition, you need to account for temporal correlation when running comparison tests. Daniel Karrenberg also pointed out that we should do work on the interoperability of two different implementations; Guy Almes thought that was a good idea, but he wouldn't advocate doing it yet. 2. Bulk Transfer framework and Treno, Matt Mathis (PSC and NLANR/NCNE). Treno is a kernel-independent implementation of the way that TCP should behave and thus can give a (mostly) kernel-independent throughput measurement. There are improvements underway and contributions are invited. Q&A Is TCP precisely-enough specified for this kind of metric to work? TCP is intentionally underspecified; the bulk transfer capacity framework draft discusses this issue in detail. TCP implementations might converge in the long term, but they also might diverge; some diversity is healthy. It was pointed out that a metric that answers the question "if I play by the rules, what flow capacity should I get" is useful. 4. Loss Patterns draft, Rajeev Koodli (Nokia Research) Draft defines sample metrics: loss distance and loss periods. Based on singleton definition of loss from one-way loss draft. Some brief comments on the current state of the draft: First, as written, the metric assumes that the one-way loss packets contain a sequence number that increments by one. However, this is not specified by the singleton definitions, so the loss pattern draft needs to be fixed (it can derive the same information from the packet loss stream). Second, how should the metric handle packet reordering? It's not discussed on the one-way delay draft. There was a question about the loss pattern versus the arrival pattern. "Did it arrive within the threshold?" is independent of reordering; thus the pattern should be implicit in the one-way loss stream. Please look at draft and comment on the mailing list. Will Leland asked if we are choosing the right metric for loss patterns; our purpose is to create metrics that are useful. Matt Mathis pointed out that you want to know the rate you are testing at... from his point of view, you want to measure at the rate the transport protocol runs at. Loss patterns make a big difference in TCP performance. 5. One-way delay and loss drafts, Matt Zekauskas The WG last call was extended to get ITU input. Garry Couch noted a clock terminology discrepancy. No direct mapping was found; the main point of confusion would seem to be the term "clock": ITU views "clock" as a frequency reference, while IPPM views "clock" as a time-of-day source. A loose terminology correspondence was established for other terms, just to give some point of reference, not as exact replacements, and those are now in the draft. The other differences from I.380 were mostly minor, and Will Leland would enumerate them in the next talk. Bruce Siegell raised an issue about negative delay values -- the draft explicitly specifies that delay values are non-negative, but delay measurements could be negative and be within error bounds. The idea is to not force individual values to be thrown out if they could be useful to create a distribution. This point was not judged to be controversial. 6. Relating ITU and IPPM Metrics: Framework, Loss, and Delay, Will Leland (Telcordia). ITU and IETF are trying to communicate, but the drafts seem to suggest they are from different planets. This talk tries to articulate and explain some of the differences. Nobody wants multiple standards for the same thing. But are ITU and IPPM producing documents for the same thing? Our goal is to have a clear statement of the area of overlap in concerns, and strive for consilience within that area of overlap. The ITU and IETF have common goals, but a different emphasis: the ITU strives to evaluate a service while the IETF measures the network; see the slide from Will's talk. The T1A1.3 folks have taken pains to use some IETF vocabulary, e.g., host, link. but there are other areas where there is only approximate equivalence (e.g., ITU network section versus an IPPM cloud; focus on corresponding events versus the fate of a single packet). Other terms have no correspondence. For example, I.380 has a notion of a IP packet transfer reference event; IPPM has the "wiretime" notion (again, see Will's slides for a longer list). In sum, the differences are in what is explicit. The technical differences on loss and one-way delay involve notions that are missing from one or the other, but no notions are incompatible. In some cases there is a formal translation: a given I.380 IP packet transfer delay can be described using a particular Type-P. In other cases there is an empirical map. For example, the notion of IP reference event is different than the IPPM wire time, but there may be a constant offset; I.380 uses corresponding events while IPPM use the notion of "same packet". In some cases, however, there is no feasible mapping; IPPM simply has no notion of recording misdirected or spurious packets. Will went on to describe what the two groups fail to mention in more detail. The summation, however, is "don't panic". Most differences in approach and emphasis serve the different intended uses of the metrics, but have no operational significance. A few differences could be confusing: - IPPM does not discuss aggregate statistics for finite-delay packets (so far, packets that are lost are included in the aggregates as infinite delay packets); this results in different quantile measurements. - I.380 views a packet that traversed a non-permissible measurement point as lost - I.380 doesn't discuss differences between actual reference-event time and measured time - I.380 implies reference-event time is in the IP layer of host stack, while IPPM uses physical network ingress or egress - IPPM doesn't address of details of misdirected, spurious, etc. packets There was consensus that an RFC of this talk would be useful. 6. Futures, a discussion led by Will Leland Should we do further work regarding potential incompatibilities between IPPM and I.380 metrics? One point made was that IPPM seems to focus on end-to-end, while I.380 focuses on what happens in the middle... also there were no known implementations of the I.380 metrics (in particular, with respect to definable, but hard to detect concepts like misdirected packets). Consensus: proceed with loss and delay without trying to align IPPM and I.380 further; we need normative references within IPPM. Further changes could be additional RFC's. Continue to work together with the ITU to ensure no major incompatibilities occur. What about other documents? Statistical methodology on how to compare implementation is needed. Consensus was that we need a delay variation or jitter document. Phil Chemento stated he was willing to become document editor; People interested in helping should contact him (directly or through the chairs). A plea for feedback on the loss patterns as well as the bulk transfer capacity drafts was made -- especially from people with TCP expertise. A questioner asked if we need specific Quality of Service metrics. It was noted that QoS notions can be captured by specifying Type-P. However, additional restrictions may be necessary. For example, don't do bulk transfer tests at high priority in an environment with priorities. There was no call for conceptual changes to the delay and loss notions, but an applicability document would be useful. Some other groups (for example, BMWG) are assuming that IPPM will work on multicast performance measures; is that appropriate? The meeting closed with the Chairs noting that further discussion should be taken to the mailing list.