Benchmarking Methodology WG Minutes WG Chair: Kevin Dubray Minutes reported by Kevin Dubray. The BMWG met at the 44th IETF on Monday, March 15, 1998. The group consisted of over 42 people. The chair presented the agenda: 1. Agenda/Administration 2. Terminology for Cell/Call Benchmarking 2.1 Terminology for Frame Relay Benchmarking 3. IP Multicast Benchmarking Methodology 1. Agenda/Administration The agenda was approved as presented. The chair announced that the RFC Editor re-issued RFC 1944, "Benchmarking Methodology for Network Interconnect Devices," as RFC 2544. RFC 2544 redresses an incorrect address reference in RFC 1944. [The chair should have mentioned that the Firewall Benchmarking Terminology draft, , was currently trying to be matched with an IESG reviewer. AD Alvestrand was attempting to escalate.] The chair announced that he had received several queries on the status of the LAN Switch Benchmarking Methodology draft . He informed the group that editorial reinforcements should be arriving soon and to expect a draft in the next few weeks. 2. Terminology for Cell/Call Benchmarking Jeff Dunn was introduced to discuss the Cell/Call Benchmarking Terminology draft and its newly issued Frame Relay companion. Jeff introduced the drafts' co-author, Cynthia Martin. Jeff gave a presentation that outlined decisions made since the previous IETF. Namely, split the previous draft into two separate drafts: an ATM-centric draft and a Frame Relay-centric draft. The presentation (See BMWG slides in the IETF Proceedings.) outlined motivation, focus, goals, workplans, modifications, and input related for each draft's effort. During the presentation, there was discussions addressing some of the content. One discussion addressed the motivation with respect to the ATM work. Jeff reiterated the overall motivation was to measure ATM as a link layer entity and assess its impact on higher-layer entities. Moreover, the effort would attempt a layer by layer analysis and consider control-issue impairments. Dunn proposed that the draft hold off addressing LANE and similarly postpone addressing MPOA until MPOA further matures. Andy Malis agreed. Harald Alvestrand reinforced that it is generally a good thing to have discrete, well bounded efforts so that expeditious progress can be demonstrated. Dunn stated that the cell/ATM work was done with the B-ISDN model in mind to serve as a foundation. There were no dissenting voices to consider otherwise. Malis did mention that he thought there was relevant work-in-progress to consider - GFR, Guaranteed Frame Rate. GFR was proposed to help address IP-centric application requirements. Dunn responded that, indeed, the work was relevant, but restated the notion of working to the B-ISDN model. That is, attempt to specify a layer type of test and communicate its impact on IP-related performance. The presentation proposed considering metrics relating to ILMI. A question was posed as to why worry about ILMI. Dunn replied that ILMI has a cascading impact as a conditional requirement on overall network behavior. Jeff mentioned that it may not be a bad idea to model associated methodologies after ISO 9646 in order to communicate a distinct sets of test and measurements. A statement was made on a related slide suggesting Pass/Fail criteria for benchmarks. The goal, it was communicated, cannot be to present DUT acceptance criteria. Conformance testing is outside bounds of the BMWG; it would be better to describe mechanisms that ensure correct and repeatable results. A comment was made that metrics defined in section 2 of the cell/call draft, while instrumenting the ATM layer, didn't necessary demonstrate how errors at the ATM layer impacted the IP layer. Dunn stated that is exactly the nature of input that he was seeking from the BMWG mailing list: "AAL Reassembly time is a good metric for me because Reassembly time impacts IP application foo in this way.." Jeff made an appeal that would change the words in the draft's title from "cell/call" to "ATM" to better reflect the draft's ATM nature. There appeared to be little dissention. 2.1 Terminology for Frame Relay Benchmarking Dunn started this presentation by stating that the frame relay slides looked very much like his ATM slides. (The slides can be found in the IETF Proceedings.) Andy Malis suggested that Jeff talk to the Frame Relay service MIB folks, if he wanted a different look and feel. Moreover, Andy suggested that FR-13, a service level agreement document, be considered as a foundation for establishing related nomenclature. With the chairperson dancing around the podium, Jeff concluded his presentation with a request for help by way of a replacement author or co-author. Harald Alvestrand polled the group for interest levels in different areas. There were many hands raised. Harald concluded that there were many possible volunteers. 3. IP Multicast Benchmarking Methodology This discussion occurred in two parts: Hardev Soor and Debbie Stopp presented the areas in the draft for which they were respon- sible; Ralph Daniels presented his area. Both the Soor/Stopp and Daniels slides can be referenced via the IETF Proceedings. It was noted that references to RFC 1944 in the draft be replaced with RFC 2544, the successor to RFC 1944. On the Multicast Test Setup slide, a discussion opened up on the topic of "Magic Pattern" (a tag or pattern that forms the basis of an auditable event or other filtering function.) With regards to test frame scrutiny, Jeff Dunn questioned whether the discussed discriminating heuristic was too exacting. Hardev and Debbie replied that one needed exacting scrutiny in order to determine whether the packets were forwarded correctly, or whether the right number of packets were received based on counting criteria specified (e.g. data PDUs vs. control plane PDUs). A follow-on question targeted whether it was needed to specify the exact tag or trigger pattern that was embedded in the test frame. Hardev said he didn't believe the specification of tag format was really important, but thought it was important to mandate a trigger/counter mechanism. The actual design of that mechanism was an implementation decision. It was asked which addresses in the Class D range were better suited for testing than not. Debby explained there are some reserved addresses identified in existing RFCs that identified proper ranges. Alvestrand offered that it would be prudent to use locally-scoped addresses for testing. On the topic of Mixed Class Throughput, it was stated that the tested ratio of multicast-to-unicast packets be specified in the test results. It was mentioned that this was a test method where specifying triggering mechanisms would be helpful. Another suggestion offered that source and destination addresses could be useful. Another person commented that identifying what the triggers or discriminators are can be useful, but knowing what to do with those triggers is equally important. Cynthia Martin suggested that an appendix to address these issues might be helpful. One person raised an issue regarding the level of "multi in multicast" in this test. The ensuing discussion centered around the test being device-oriented versus system-oriented versus network-oriented. Another area touched in the discussion was the utility of these style of tests taken in a clinical scenario compared to a live network scenario. It was ultimately agreed that the issue would not be solved in this forum that day. With respect to multicast latency, it was suggested that the draft not present a latency value for each multicast group address, but a latency value per source, destination port pair. It was further suggested that latency be reported per group per port. It was further suggested that one need not create a metric to report mean latency, as the mean could be a derived statistic from the latency measurement. A few folks questioned why encapsulation metrics show up in a multicast benchmarking draft. It was claimed that encapsulation not a multicast related issue per se. A counter argument stated that multicast packets were sometimes tunneled to traverse non-multicast capable devices. A brief discussion on current router feature sets followed. In the end, the Area Director offered that if an item is not deemed applicable, it can always be declined to be addressed. With regards to the Group Leave Latency metric, a question was posed as to what rate/level should the test data traffic be offered to DUT? It was suggested that tests of this type use the no-loss forwarding rate as determined by the throughput test. Another comment reflected on the need to minimize tester error. For example, in this test a negative latency result might not reflect a "bad" thing. Or the granularity of a test device clock may lead to incorrect conclusions about the DUT. It was thought the methodology should reflect those checks to ensure correct test execution or sanity in the test results. In addressing his portion of the presentation, Ralph Daniels opened by mentioning that the authors each took a set of metrics for which to define test methods. As a side effect, Ralph urged reviewers of the draft to keep this in mind and identify any inconsistencies between the methodological definitions. With respect to Scaled Group Forwarding, a question was asked on how much to increment the number groups to be tested. It was suggested that the draft make a suggestion regarding to the scaling steps. With regards to the output of the SGF metric, Ralph question whether folks wanted to see RFC 1242 throughput or a forwarding rate reported as a result. There was consensus that the forwarding rate was the preferred vehicle. On the topic of Join Delay, there was some discussion about the relationship of the test data stream to the metric as suggested by the slide. Why is the test looking at the first frame of test data traffic? Is this correct? Another person maintained the delay is independent of when the data frame is sent. In discussing issues surrounding Group Capacity, it was thought that the test device's receiving ports must join all the groups registered on the DUT/SUT. It was thought the test device should be able to be robust enough should the DUT leave the multicast group(s) it joined in multi-cycled tests . In response to Daniels' question, it was thought that the Group Capacity metric was a DUT-based (as opposed to port-based) metric. Goals for next period: 1. Address outstanding issues in the ATM terminology draft. 2. Solicit help for Frame Relay terminology draft. 3. Demonstrate progress on LAN Switch methodology draft. 4. Progress Multicast Methodology Draft.