Minutes of the Audio/Video Transport Working Group Reported by Colin Perkins and Steve Casner The audio/video transport working group held two full sessions in Oslo. In addition, a small sub-group took part in a telephone conference with the MPEG committee who were meeting in Vancouver. There were 14 new drafts (revision -00) in AVT this meeting, in addition to 7 revised drafts, which presented a challenge to cover them all. The first session covered the RTP specification and payload format documents already under discussion. The second session introduced new proposals for header compression, multiplexing and some additional payload formats. The meeting started with a review of the status for documents in process. The RTP payload for PureVoice (QCELP) audio has completed last call and is awaiting publication. The drafts for RTP MIB, SSRC sampling, generic FEC payload format and guidelines for writers of payload formats were in working group last call (with the exception of the MIB, these have now been sent to the IESG for IETF last call). The revision of RTP and the audio/video profile is proceeding, and Steve Casner presented the new drafts here. The recent changes to RTP in the latest draft (draft-ietf-avt-rtp-new-04.txt) consist primarily of clarifications in the text which are listed in the Changes section. A number of open issues for the RTP spec were discussed: Should the SSRC collision algorithm be changed to specify one SHOULD follow the new network source address to accommodate mobile endpoints which change IP addresses? No real consensus. Replace pseudo code for the collision algorithm with C code? Yes. RTP says ports MUST be distinct for the streams of a layered coding, but SDP says this is illegal if the multicast addresses differ because the mapping is potentially ambiguous. We will extend SDP to make this legal for an equal number of addresses and ports and a one-to-one mapping. Should avg_rtcp_size be changed to exclude received RTCP packets (i.e., just average over the packets you send)? Leave things as is, and ignore the potential for unfairness, since there's no consensus that we can find a better algorithm without significant complexity. The RTP specification itself is believed almost complete now. Recent changes to the audio/video profile (draft-ietf-avt-profile-new-06.txt) include: The reference to the MIME registration draft has been reworded to make it non-normative. Fixed numbers in wrong columns of table 1. Removed G726-16/24/40, G727, and SX since no packetisation is defined. Clarified G722 and VDVI packetisation. Didn't add tables for GSM-HR and GSM-EFR since the ETSI TIPHON vice-chair clarified that TS 101 318 is a referenceable specification. Again, there are a number of open issues: The sample clock rate for the G722 is 8kHz but should be 16kHz. This is a historical mistake - do we wish to correct it? Consensus was that it's not a good idea to change this now, since implementations exist with the old rate. It will be left at 8kHz but a note inserted giving reasons for the error. Should the CN payload type be changed from 19 to 13 as used in IMTC VoIP spec (13 used to be VSC and is reserved)? Scott Petrack noted that there may be people using VSC, and volunteered to contact them for clarification on the status of this codec. The companion documents to the RTP specification now all exist, and are in reasonable state. The MIME registration draft (draft-ietf-avt-rtp-mime-00.txt) and SDP bandwidth modifiers draft (draft-ietf-avt-rtcp-bw-00.txt) were briefly presented by Steve Casner and generated little comment, although it was noted that there are some overlapping MIME type registrations (e.g., audio/MPEG vs audio/MPA and vnd.qcelp vs audio/qcelp) which will have to be clarified before this draft is complete. The major remaining requirement for advancement of RTP and the audio/video profile to draft standard is an interoperability statement. Colin Perkins outlined the interoperability requirements placed on protocols advancing to draft standard status (see RFC2026) and presented the first version of an interoperability statement (draft-ietf-avt-rtp-interop-00.txt) for the RTP spec. An interoperability statement for the audio/video profile is also required, but has yet to begin. It was noted that this is not a set of conformance tests: all that is required to demonstrate interoperability for this purpose is a claim from the implementors that implementations X and Y interoperate for a set of features. Implementors are encouraged to study this draft, to test their implementations, and to report back the results so that we can advance the specifications. Of course, such testing is a difficult business, and it is advantageous if there is a common framework and guidelines for those conducting tests. Two drafts were presented which aim to provide such a framework: Colin Perkins presented draft-ietf-avt-rtptest-00.txt which deals with RTP, and Jonathan Rosenberg presented draft-ietf-avt-rtcptest-01.txt dealing with RTCP. Once again, we note that these are NOT conformance tests - merely suggested testing strategies. We plan to merge these two testing drafts into one, to be published as an informational RFC eventually. The drafts should be useful for testing in the near term. Following the discussion of RTP, Mark Baugher presented the RTP MIB (draft-ietf-avt-rtp-mib-05.txt). Working group last call was issued on this draft in April, but there were several comments received. Some of these had to do with clarification of wording, but a more significant concern was the need to walk an entire table to locate a desired entry in some cases. This shortcoming was addressed by adding optional index tables for each of the primary tables to allow looking up the desired entry and getting an index for that entry in the primary table. The new index tables are optional because some applications, such as unicast H.323, will only have single entries in the primary tables. Comments on the changes were generally favourable, although Dave Thaler requested additional wording in the draft to give guidance to implementors about when the optional tables should be implemented. The authors plan to say that implementations operating in multicast mode SHOULD implement these tables. The MIB is referenced from H.341 which was accepted as an ITU-T standard in May. The ITU was satisfied that the publication of the MIB as an RFC was not yet completed given that the changes proposed above do not affect the H.323 usage of the MIB. Scott Petrack presented the payload for DTMF and other related tones (draft-ietf-avt-tones-00.txt). The current draft has five named groups of tones each of which would be registered as a MIME subtype for binding to a dynamic payload type. Applications which support only a subset of the group would use an a=fmtp attribute to indicate that subset. Concerns were expressed about "polluting" the MIME namespace and about having two mechanisms (name and fmtp) for expressing the subset of the total set of tones. Colin Perkins suggested that the payload format be identified by a single MIME subtype, and that subsets of the tones could then be specified using an a=fmtp attributed with either a named subset or an explicit list of tones. It was agreed that this change will be made. It was also noted that the DTMF payload format specifies that a period covered by an encoded tone may overlap in time with a period of audio encoded by other means. This is likely to occur at the onset of a tone and is necessary to avoid possible errors in the interpretation of the reproduced tone at the remote end. Implementations have not been expected to handle this kind of overlap with other codecs, so was requested to add a statement to the draft that implementations supporting the DTMF payload type must be prepared to handle the overlap. A new draft with these two changes will be sent to the list, at which point working group last call will be issued. The generic FEC draft (draft-ietf-avt-fec-06.txt) was presented by Jonathan Rosenberg. The change this time is to the means by which FEC streams were signalled in SDP. The new approach carries the address used for the FEC stream as an a=fmtp attribute rather than on an additional c= line because of potential ambiguity with layered encodings using multiple c= lines. Anders Klemets noted that this needs extension for use with RTSP where the address on the c= line is zero and the source URL is carried on the a=control line instead. Anders agreed to send text on this point to be incorporated into a revised draft, then last call will be requested. Reha Civanlar presented an report on the interim meeting held in New York on the 25th of April to discuss transport of MPEG-4 over IP. The items of discussion at that meeting were: the payload format, timing and buffering models, and multiplexing. Two payload formats are proposed. The first is specified in draft-ietf-avt-rtp-mpeg4-01.txt, which has not changed since the last meeting. This is a simple mapping of MPEG-4 SL packets onto RTP, assuming smart (transport aware) SL packetisation and with additional protection being provided by the standard RTP mechanisms (draft-ietf-avt-fec-06.txt, for example, possibly multiplexed with the data using RFC2198). The alternative draft-guillemot-genrtp-01.txt implements more advanced error protection in the RTP payload format itself and requires additional information from the MPEG-4 Compression Layer to indicate which parts of the data need protection. The unresolved issues were noted as being: Should there be an MPEG-4 specific definition of error correction and interleaving, instead of using generic RTP techniques? Is a fragmentation mechanism for SL packets on the RTP layer needed? Is a grouping mechanism for SL packets on the RTP layer needed? Does SL packet classification information have to be conveyed end to end either in the SL or RTP header? Should a mechanism for the repetition of ADU header information be available as part of the A/V compression algorithms (as assumed by draft-ietf-avt-rtp-mpeg4-01.txt) or should it be built into the payload format (as assumed by draft-guillemot-genrtp-01.txt)? The discussion of the timing and buffering model was productive, and resulting in a revision to the usage of RTP timestamps in both proposals to align them with MPEG-4. The discussion of multiplexing was noted as not reaching a solution; in particular, it is unclear how to reconstruct the timing of a multiplexed stream, a notion which is not present in RTP. Following this summary of the interim meeting, Stefan Wesner presented the latest revision to draft-guillemot-genrtp-01.txt. As noted previously, this payload includes more complex error protection mechanisms based on RFC2198, draft-ietf-avt-fec-06.txt, and the Reed Solomon and interleaving drafts previous presented. These are included by integrating them into the new payload, with subtle modifications (which makes them incompatible with the originals). It was suggested that the separate mechanisms should be used in combination rather than incorporating ideas from them into this payload format, but the authors countered that to use the mechanisms separately incurs too much overhead. The issue was not resolved. On the evening after the second AVT session, this payload format discussion continued in a telephone conference between a dozen AVT participants and a similar sub-group of the MPEG committee meeting in Vancouver, led by Carsten Herpel. The main areas of concern regarding draft-guillemot-genrtp-01.txt were that additional work needed to complete those areas of the MPEG-4 standard required to implement it was not being undertaken, and that the protocol is too complex. The concerns regarding the simpler proposal draft-ietf-avt-mpeg4-01.txt were that it provides insufficient error protection (or if sufficient protection is provided, the overheads are excessive). The discussion continued for some time, with no consensus being reached. In the end, it was decided to progress both payload formats to experimental status, and in the light of implementation status to progress one or both of them in future. As a result, the guillemot draft will become a working group work item, assuming there are no objections. In the telephone conference, it was also agreed that there will also be a third MPEG-4 packetisation format consisting of multiple elementary streams multiplexed into an MPEG-4 FlexMux packet which is then carried over RTP with the RTP timestamp set to the delivery time. This specification is still to be written. This method is motivated by the need for some applications to easily gateway multiplexed MPEG-4 streams carried over IP into an MPEG-2 Transport Stream, which will use FlexMux. This parallels the alternative packetisation mode for Transport Streams in the MPEG-2 payload format. The final presentation on the first day related to the payload for DV audio and video draft-ietf-avt-dv-video-00.txt and was made by Akimichi Ogawa. This payload format is essentially complete, and an implementation is progressing to verify this. When the implementation is complete, a new draft will be issued and last call will be issued. This payload format recommends that audio and video be sent in separate streams when the audio format is simple. In that case, the audio data would be extracted from the DV format and transmitted as L16 format when the data is 16-bit linear. Since the audio may also be encoded as 12-bit non-linear, a new payload format for that was introduced in draft-kobayashi-dv-audio12-00.txt. This format is relatively straightforward. However, for both audio formats, a problem was identified: it may be necessary to define some SDP a=fmtp parameters to specify what analog preemphasis was used. The second AVT session began with a presentation of a taxonomy of multicast feedback (draft-hnrs-rmt-avt-feedback-00.txt) by Jonathan Rosenberg. This lead into a discussion of a possible new RTP profile for unicast RTCP, lead by Steve Casner. Some motivations for producing a new profile are: to reduce the amount of (S,G) multicast routing state induced by many receivers sending low-rate RTCP traffic; to avoid distributing RTCP identity and feedback information to other receivers for privacy or competitive reasons; because the network might only support single source multicast. There are two main problems when designing a unicast feedback scheme: avoiding implosion of feedback (which means that the receivers must be told how often to send RTCP), and to prevent packet bombing (by preventing receivers from being told to send RTCP at a high rate or to an innocent bystander host). The current all-multicast model uses distributed control to preclude sabotage by a single entity; a bad guy can only slow the rate. With a single source controlling when receivers can send feedback, we are more open to attack if that source can be spoofed. The initial proposal was for receivers to unicast their RTCP reports to the source, which forwards them to the group via multicast. The source can, if desired, remove and/or aggregate SDES and/or RR information to allay privacy concerns. The source must be authenticated, probably by including its address in an authenticated session description, else we are open to packet bombing. BYE packets must be ignored because they could be spoofed, causing an underestimate of the number of participants and an excessive RTCP transmission rate. A second proposal is for the source to send only a new RTCP packet type giving the number of receivers, with the receivers unicasting RTCP to the source as before. Receivers have to limit the rate at which the group membership is allowed to decrease, to prevent a spoofer from being able to force an implosion. Open issues include: what about sessions with multiple sources? How does it interact with adjustable RTCP bandwidth fractions? The final question is, of course, should we do this? Deployment of better routing (which doesn't suffer from (S,G) state problems) will take a year or more, but probably so will deployment of a profile such as this. Would this add value in the long run? No consensus was reached in the session, discussion is encouraged on the mailing list. The major item on the second day was header compression and multiplexing, starting with a presentation by Lars-Erik Jonsson on Rocco, a new header compression scheme for cellular links (draft-degermark-crtp-cellular-00.txt). This presentation noted that the problem with RTP header compression is not compression efficiency, but packet loss rate due to context damage and the loss length distribution (often 7-8 packets in a row, due to the long round-trip-time of the cellular links and the need to resynchronize decompressor state). An alternative solution was presented (draft-jonsson-robust-hc-00.txt) along with supporting results. This work clearly solves a real problem with the RTP header compression scheme and lossy long-RTT links, but it is presently unclear if AVT is the correct group to work on this, or if the general header compression framework proposed should be progressed in another group. It was also noted that IPR exists on this work, and the licensing terms are not yet clear. The WG chairs were requested to check on this with Ericsson. The next presentation was by Irfan Ali and related to PPP multiplexing (draft-pazhyannur-avt-pppmux-00.txt). This proposal allows one to pack multiple IP packets, especially compressed RTP packets, into a single PPP frame, to amortize header overheads. This work was also presented in the PPPEXT working group, and was adopted as a work item by that group since it is not specific to AVT. Tmima Koren presented tunnelled compressed RTP (draft-wing-avt-tcrtp-00.txt), a scheme for extending RTP header compression to allow it to be used end-to-end across an IP tunnel, potentially with multiple packets multiplexed into a single IP packet to reduce the tunnel overhead. The difficulty with using CRTP across a long path is increased delay, loss and misordering. TCRTP makes assumptions about the maximum number of consecutive packets lost and the degree of misordering (to be achieved by network engineering and/or QoS), then uses redundant transmission of header fields when the compression deltas change to achieve a high probability of recovery using the "twice" algorithm. It was agreed that the TCRTP and Rocco work was important, and should be pursued. Khalil El-Khatib presented another Multiplexing Scheme for RTP Flows between Access Routers (draft-ietf-avt-multiplexing-rtp-00.txt). This proposal met with considerable criticism. In particular, the notion that the entry access router will know which access router will be the exit implies a requirement for application level routing which doesn't (and perhaps shouldn't) exist. The proposal also did not handle packet loss well and did not support RTCP. The consensus of the group was that this proposal needs significant work before it can proceed. We noted that at the last IETF meeting the proposed strategy for developing an RTP multiplexing scheme was to adopt GeRM as a starting point, try implementing it in the application scenarios that had been proposed, and see whether it works or what changes might be needed. Some limited feedback on this issue has been received from the MPEG community, but none from other sources. Anyone having opinions on the GeRM's suitability or lack thereof is encouraged to provide feedback to the list. Mathias Kretschmer presented an MPEG-2 AAC audio payload for RTP (draft-kretschmer-mpeg2aac-01.txt). The important aspects of this payload format are the redundancy scheme and priority vector, both of which are designed to increase error resilience. It was agreed to make this a work item of AVT. Reha Civanlar presented an RTP payload format for Real-Time Tele-Pointers (draft-civanlar-rtp-pointer-00.txt). This is simple, encoding x- and y- coordinates and button press indicators. After discussion we noted that encoding position as a fraction of the display (rather than absolute pixel values as currently specified), using a 90kHz clock (to synchronise with video) and allowing for 3 buttons were important changes to make. In addition, it is unclear how to deal with more advanced pointing devices such as the wheel mice now available. This payload format could become significantly more complicated if extended to be a general mouse event transmission mechanism; comments tended toward keeping it simple. This document will be adopted as an AVT work item. Gunnar Hellström presented an RTP Payload for Text Conversation (draft-hellstrom-avt-rtp-text-00.txt) needed for carrying the ITU T.140 standard over H.323. This is a straight-forward payload format, with few complex interactions. The open issues identified are: should the recommendations for use of the payload format be part of the main specification or an appendix? (an appendix) How is RTCP used with this payload format to indicate the amount of FEC to add? Should this be specified? AVT's processing of this payload format spec must be complete by February 2000 to be referenced by the relevant ITU standards. Finally, John Stewart presented a brief overview of an RTP payload format for shared virtual worlds (draft-stewart-avt-00.txt). This uses RTP for transport of orientation info, gestures, clicking, etc. The motivation for developing this payload format was simplicity compared to the more complex mechanisms employed by the DIS/HLA community, but Christian Huitema expressed concern about "a flurry" of solutions for the same problem. This work is more in the future compared to the other payload formats which are expected to be completed fairly quickly. Steve Casner summarized the meeting to say that several of the payload format proposals would be added as AVT work items, subject to confirmation on the mailing list. Similarly, the proposals for multiplexing and header compression require further discussion on the mailing list to determine which should become work items and be developed into standards.