Minutes of the Audio/Video Transport Working Group Reported by Steve Casner 1. Introduction and status The AVT working group had not planned to meet in Munich during a period of dormancy while implementation and testing of RTCP scaling mechanisms was underway. However, in response to several requests, one AVT session was scheduled to discuss a new problem with the RTCP scaling plus open issues for several new payload formats that were submitted since the last meeting. The next goal for AVT is to get RFCs 1889 and 1890, the Real-time Transport Protocol and the companion RTP profile for audio/video conferencing, revised for advancement to Draft Standard by year's end. The profile has been revised in draft-ietf-avt-new-profile-01.txt and .ps. The main RTP specification needs to incorporate RTCP scaling changes that are still being refined; however, an interim first draft of the other revisions, including extensions for layered encodings, should be published as soon as posible. As of the Munich meeting, the RTP payload format for H.263 video had been approved for publication as a Proposed Standard, and the payload format for redundant audio was close to approval. (Since the meeting, both have been published as RFCs 2190 and 2198.) The drafts on compression of IP/UDP/RTP headers and a revision to the RTP payload format for MPEG2 in RFC 2038 have been posted for IESG Last Call before publication as Proposed Standards. 2. RTCP "timer reconsideration" Jonathan Rosenberg gave a brief review of the problem that a flood of RTCP packets can occur if a large number of participants simultaneously join a session. This problem and the solution of RTCP "timer reconsideration" have been discussed at the previous two IETF meetings and are explained in the recently posted Internet-Draft draft-ietf-avt-reconsider-00.ps. However, a concern raised at the last meeting was that the timer reconsideration algorithm exhibits a "plateau effect" wherein no RTCP packets are sent for a period after the algorithm stops the flood. New simulations show that the plateau effect is significantly reduced as the joins are spread out in time and disappears when the join rate is no more than 500 users/sec. Since perfectly synchronized joins are very unlikely, the plateau effect is not considered to be a problem. There is an analogous problem of an RTCP BYE packet flood on simultaneous leaves. Unlike the initial RTCP packets at the time of joining, BYE packets can't be delayed because the application is terminating. Some partial remedies were discussed at the last meeting. The newly discovered problem is a relatively minor one related to the BYE flood: if many participants leave a session at once, other participants that remain in the session may be falsely timed out. Some of the remaining participants will have their RTCP interval expire earlier than others. These participants will reduce their estimate of the number of participant and consequently also reduce their RTCP interval. Since the remaining participants may still have a long time to wait before their previously calculated RTCP interval expires, they might not send any RTCP packets while multiple shorter RTCP intervals elapse for the particpants that have noticed the drop. The long-waiting participants will therefore be timed out (considered inactive) by those that have adjusted to shorter intervals. Even if the BYE flood is slowed to the normal 5% RTCP bandwidth, simulations show that a single missed packet can cause a timeout. An extension to the timer reconsideration algorithm, dubbed "reverse reconsideration", recomputes the RTCP timer whenever the group size estimate decreases (due to a BYE). This significantly reduces the amount of time during which the group size estimate may be wrong, but there is still a problem that the estimate can drop to zero. This problem can be eliminated, as suggested in an INFOCOM97 paper by Sharma, Estrin, Floyd and Jacobson on a similar problem, using a filter to slow the rate of decrease of the estimate. Further analysis to determine the right kind of filter is underway. Jon Crowcroft suggested keeping a bit of history from interval to interval and continuously estimating the group size to use as a predictor. However, a predictor might not work well with unexpected sharp transitions in group size. He also suggested that the BYE flood might be managed by having participants send BYEs with a limited scope and then having the participants that remain act as proxies to retransmit the BYEs spread out in time. A proxy election protocol would be needed. This starts to get pretty complicated. Dave Oran suggested that there could be a server thread that takes responsibility for sending a delayed BYE after the application is closed, just as TCP implementations must save some state when a connection closes to shut down gracefully. Steve Casner responded that if some implementations can do this, great, but we can't assume all will. Similarly, it is important to note that although we want to define a timer reconsideration algorithm to go in the appendix of the RTP spec, different implementations can implement different timer reconsideration algorithms and the whole system will still work. The group was asked if there were any objections to the proposal to add the timer reconsideration algorithm to the RTP spec. There were none. 2.1 Large-scale tests of the RTCP scaling mechanisms The timer reconsideration algorithm has been simulated, but we would have more confidence if it was tested on a large scale with real implementations. If that is not practical, testing on a large-scale simulation/emulation testbed network would be a good backup for the initial simulations. The audience was queried about any planned tests, but no plans were reported at this time. Jonthan Rosenberg is implementing a test program called rtpbomb that can emulate a large number of participants. It may be used in combination real participants to test timer reconsideration and other aspects of RTCP scaling. His analysis suggests that most of the phenomena in the algorithm are linear with group size, so a smaller test should still be useful to predict behavior of a larger group. 3. Other RTP Issues/Questions Steve Casner brought up three other issues for the group to consider, but there were few comments: - During the discussion of the global multicast address allocation scheme in MBONED working group, Van Jacobson asserted that applications should not depend upon sequential multicast addresses being allocated for multiple groups carrying the different layers of a layered coding scheme. The layered encoding extension to RTP proposed by Speer and McCanne does assume sequential addresses, so a revision may be necessary. - AVT has received a Request for Proposals from the DAVIC group. It would be difficult for the group as a whole to respond, but individual participants are invited to do so. - Should AVT set a policy of allocating no more static payload types such that dynamic payload types should be used instead? Scott Petrack responded to the last point to say that we should consider none of the assigned payload types to be static, but rather default assignments. When a session control protocol is in use, if there is a need for more than the 32 dynamic payload types already set aside, the default values could be reassigned. The group seemed to be in general agreement with this proposal, but it should be discussed on the mailing list as well. 4. New RTP payload format proposals It is expected that additional payload formats for RTP will continue to be developed as new encodings are developed and as RTP is employed in new applications. At this meeting, payload formats were introduced for H.263+ and BT-656 video, MPEG4 and QuickTime multimedia, and DTMF tone signalling in audio. In addition, "meta" payload formats that specify methods for repair of packet losses were discussed, including the question of how much error correction is appropriate. 4.1 H.263+ video payload format As noted in the introduction, the payload format specification for H.263 video has been published as a Proposed Standard. However, enhancement of the encoding itself continues under the label H.263+ for improved loss resilience and to utilize increased processing power to achieve higher quality. As agreed in previous AVT meetings, a separate RTP payload format is to be devised to support the enhanced encoding. In fact, two different approaches were proposed in draft payload format specifications sent to the AVT mailing list just before the meeting. Stephan Wenger, who is the primary author of one and a co-author on the other, gave an overview of the H.263+ encoding enhancements and the tradeoffs between the two payload formats. The first approach, from TU-Berlin and U-Bremen, uses a very simple payload header but includes the H.263+ picture header in each packet except for "follow-on packets" when app-level fragmentation is done. This allows for the use of almost all of the optional modes defined in the H.263+ encoding scheme. The second approach, from Intel, follows more the design of the existing H.263 payload format in which selected fields from the picture header are incorporated into a set of payload format headers for different modes. This approach reduces overhead to support small packet sizes but precludes use of some encoding options deemed to be rarely used. Wenger stated the intention of the authors of both approaches to resolve the differences and produce one merged proposal. He suggested that the merged proposal might include both approaches. Steve Casner countered that this might not be a good result because it increases complexity and reduces interoperability -- putting in all the options when a decision can't be made is often the result of a committee design. Christian Huitema commented that some encoding functions were left out when the H.261 payload format was designed because they were not applicable to packet transport, and asked if a similar analysis could be made here. Wenger replied that H.263+ has been designed to work with packet loss and that some of the optional modes were added expressly to deal with packet loss. Wenger requested input soon from the working group regarding the tradeoffs between the two approaches because the recommendations for mode combinations to use are being prepared now by the ad-hoc group he leads. The two specifications should be posted as Internet-Drafts for wider exposure. 4.2 MPEG4 payload format There is no draft payload format specification for MPEG4 yet because Gerard Fernando would like feedback on some design questions first. He gave a brief overview of the scope and structure of MPEG4, which is a framework for integrating different kinds of natural and synthetic media streams built around audio/visual objects called AVIOs. There are two primary questions: - How can MPEG4 payload format refer to existing and forthcoming payload formats for the encodings of individual media streams (AVIOs) rather than trying to redo them all? - How should the "scene description data", which composes the individual AVIOs into the complete presentation, be transmitted? If this data is lost, the whole scene is lost, so adequate loss resilience mechanisms must be employed. Can the scene description data be decomposed over both time and "space" -- time because a transmission may be joined after the start, and space because media are sent in separate streams and some receivers might not tune into all of them? MPEG4 is still being developed, with standardization due in January 1999. The MPEG4 committee would like to work with AVT to ensure that packetization considerations are included in the design; they view RTP/UDP multiplexing as an appropriate means of multiplexing elementary streams. This is clearly beneficial from AVT's point of view as well. One aspect of the use of separate media streams in RTP is the ability to apply different network QoS and reliability mechanisms as needed; the VMIF part of MPEG4 is trying to formalize the selection of stream-specific QoS in a transport-independent and network-independent way. Fernando and Casner are to explore what formal or informal liaison procedures should be followed in this case, both for information transfer and to enable reference to RTP in the MPEG4 specification. 4.3 BT-656 video payload format Dermot Tynan presented a proposal, draft-tynan-rtp-bt656-00.txt, for carrying ITU-R BT.656-3 uncompressed video over RTP. BT656 is studio-quality digital video sampled according to BT.601-5 (formerly CCIR601) at 13.5 or 18 MHz. At the normal, lower rate, each scan line contains 720 samples occupying 1440 bytes in the 4:2:2 chrominance encoding. At the "high definition" rate, each line contains 1144 samples for NTSC or 1152 samples for PAL. The payload format consists of a simple header with bit fields to indicate NTSC/PAL, sampling rate, framing and scan line information, followed by one scan line of samples. Steve Casner expressed the concern that the packet size might exceed the MTU for some networks. Although at the lower sampling rate an IP/UDP/RTP packet will fit within a 1500-byte MTU, at the higher sampling it would not. Don Hoffman pointed out that if the packet size does exceed the MTU such that IP fragmentation occurs, integrated services packet classification won't work on fragments other than the first so those packets won't get the desired QoS. Christian Huitema claimed it is important to have application-level fragmentation so that services such as forward error correction will apply when a fragment is lost. It should not cost much to be able to indicate that a packet contained part of a scan line, perhaps by giving the starting sample number. Tynan agreed to consider this addition. 4.4 QuickTime payload format Alagu Periyannan just asked to call the working group's attention to the recently posted draft "RTP Payload Format for QuickTime Media Streams", draft-ietf-avt-qt-rtp-00.txt, and in particular to the list of open issues in section 4 of that draft. The motivation in developing this format was to carry all the payloads in QuickTime without having to define an RTP payload format for each. Steve Casner pointed out that the tradeoff of this technique is that there is some amount of additional constant description info that must be carried in each packet but would not be needed with separate formats. Philipp Hoschka asked if it wouldn't be better to define individual payload formats because the encodings might also be used outside of QuickTime as well. Periyannan replied that where individual payload formats are defined, they should be used in preference to this format. However, several of the encodings used with QuickTime are not standardized, such as Apple Video, Cinepak, and several proprietary codecs. 4.5 DTMF audio payload format Jonathan Rosenberg presented Henning Schulzrinne's proposed audio payload format to carry DTMF (tone dialing) signals as defined in draft-ietf-avt-dtmf-00.txt and .ps. This payload format might be used for calling across the Internet through IP telephony gateways to control an answering machine or other device. Low data rate speech codecs may not reproduce the DTMF tones faithfully enough to work properly at the far end. This payload format essentially provides a very low rate encoding specialized for DTMF tones. The draft defines a primary format in which each DTMF digit is represented by 32 bits to control frequency, amplitude and duration. Redundancy can be provided using the mechanism specified in RFC 2198. This introduces 64 bits of overhead, but the data rate is so low that this probably does not matter. Alternatively, a more compact representation of each digit would allow adding redundancy within the DTMF payload format and still fitting each digit into 32 bits. The selection between these techniques is an open issue on which input from the working group is sought. Steve Casner noted that the more compact format would require the RTP clock rate to be different than that of the normal audio payload format within which the DTMF packets are interspersed, and suggested that this is a significant enough disadvantage to prefer the larger format. Scott Petrack agreed that the size difference was probably not significant. 4.6 Forward Error Correction payload format Two "meta" payload formats on forward error correction (that is, independent of media type and format) have recently been posted: draft-budge-media-error-correction-00.txt by Budge, et al., and draft-ietf-avt-fec-00.txt by Rosenberg and Schulzrinne. The authors of the first draft did not attend, but Jonathan Rosenberg presented the second draft which builds on ideas from the first, and compared the two drafts. Both schemes are based on the idea of sending additional FEC packets which are the XOR of multiple packets from the original packet stream. It should be noted that the scheme presented by Rosenberg differs from that described in the second draft. An RTP header extension (X bit) is no longer used; instead, the payload type is changed to indicate an FEC packet, and an additional format-specific header is inserted before the XOR of the covered payloads, as in the Budge scheme. One drawback of the Budge draft was that the timestamp and marker bit of lost packets could not always be recovered. The proposed remedy is for these fields in the FEC packets to be the XOR of the corresponding fields from the original packets covered by the FEC packet. This has its own drawback that the timestamp will vary erratically, perturbing the jitter feedback calculation unless the FEC packets are excluded. Carsten Bormann also pointed out that when RTP header fields are perturbed from their usual increments, it can have a negative impact on the efficiency of RTP header compression. A second drawback of the Budge scheme was that the payload type was changed in the original packets to be the FEC payload type which means that a receivers without FEC capability could not receive just the original packets and ignore the FEC packets. In Rosenberg's scheme, the original packets are unmodified, including the payload type. To extend the range of packet patterns that could be covered by an FEC without having to predefine specific scemes, Rosenberg proposes to carry a bit mask to indicate the pattern explicitly. The RTP sequence number of FEC packets in Rosenberg's proposal is the minimum of the sequence numbers of the packets covered by the FEC to serve as a base for the mask. Steve Casner claimed that this is not acceptable because it will prevent the FEC packets from passing header validation and duplicate suppression algorithms in RTP packet processing. Rosenberg said alternative sequence number schemes had been considered, but the cost is additional overhead. Christian Huitema argued that if a general FEC payload format is to be defined, it should support other schemes with better performance and only a marginal increase in complexity compared to that of "n+1" schemes like XOR. For example, it is possible to specify a scheme that adds two FEC packets to eight original packets which will allow recovery from a loss of any two of the ten packets. This is important because simulations have shown that one packet of redundancy was not enough. Rosenberg agreed that the FEC proposals need further refinement. 4.7 Applicability of error correction In addition to the redundant audio and FEC payload formats already mentioned, retransmission and interleaving are two more loss resilience schemes that might be employed with RTP. Colin Perkins reviewed these four methods to compare the tradeoffs in latency, bandwidth overhead and processing overhead (details are available at http://www.cs.ucl.ac.uk/staff/c.perkins/slides/). It is an important question to consider when these schemes are applicable to particular applications or to particular network conditions. Colin presented some network loss statistics showing that single packet losses predominated by a factor of four over two-packet loss bursts. The probability drops rapidly such that long burst losses are rare. On the other hand, in a large conference, most packets will be lost by at least one receiver, therefore a retransmission scheme may need to resend almost every packet. The redundant transmission scheme has low latency and low bandwidth overhead, but potentially high processor overhead if the low-rate redundant encoding is computationally complex. A retransmission scheme is likely to impose much longer delay and will incur the overhead of control traffic in addition to the duplicate data, but provides exact repair and can correct more than single-packet losses. There is a synergy between redundancy and retransmission in that redundancy can cover most of the errors, leaving retransmission to pick up the remainder for those receivers that care more about low loss than low delay. Interleaving disperses the effects of loss, but does not eliminate them. It has low overhead, but high latency. FEC may have a similarly high latency or a high bandwidth overhead, depending on the size of the pattern covered, but its major advantage is media and format independence. For interactive applications, either redundant transmission or a low-latency FEC would seem most suitable, while for broadcast-style applications interleaving works well. An important open question is what constitutes a sensible operating point for real-time media transmission? How much loss should applications try to cope with before declaring that the user should try again at some less congested time? There is little congestion control in many real-time media applications, and extreme loss resilience measures would just worsen congestion. That would not be network friendly. If fair congestion avoidance mechanisms are deployed in routers, then applications that don't implement congestion control may be penalized. Therefore, any loss resilience schemes that are defined for RTP should consider not just loss performance but also impact on the network. 5. Revision of RTP MIB specification The final topic of the meeting was an update by Mark Baugher on the RTP MIB design and implementation plans. An interim draft of the MIB was sent to the mailing list before the meeting to provide an opportunity for feedback; a real Internet-Draft will be posted in October. A more complete description of the MIB was given in Memphis; the changes since then were to fix errors identified by Fred Baker in his review of the MIB, to add reporting of receiver feedback, to support RTP operation over different underlying protocols, and to remove from the specification those parts that won't be included in initial implementions and hence won't be validated yet. In particular, support for RTP translators was removed. Receiver feedback is reported through additional tables that are created only upon request from the network manager as a side-effect of creating an entry in the Session Table. This avoids the state explosion that might occur if all receiver feedback was always monitored for all sessions. There are a few details remaining to be worked out, but implementation of a network management application using the RTP MIB is underway to evaluate how well this MIB works for managing real-time applications on networks that weren't designed for them. To make this really workable, the management application needs to handle multicast routing in addition to RTP. The next stage of MIB development will be based on the results of the implementation. 6. Next meeting Steve Casner closed the meeting saying that AVT will meet again at the next IETF in December. As stated above, a goal for the working group is to get the RTP spec revised for Draft Standard by then. In addition, there is outstanding work on several topics presented at this meeting which should be discussed on the mailing list so that completed drafts are ready by the next meeting.