Audio/Video Transport Working Group Minutes Reported by Colin Perkins and Steve Casner The audio/video transport working group held two full sessions in Washington DC. The first session discussed the ongoing revision to the RTP specification and audio/video profile, together with a number of payload formats. The second session dealt with transport of MPEG-4 and multiplexing. In addition, a design team met to discuss the transport of MPEG-4. The meeting started with a review of work in progress and the status of the working group's documents. The RTP payload format for generic FEC (draft-ietf-avt-fec-08.txt) has been approved for publication as a proposed standard, and the guidelines for writers of RTP payload format specifications (draft-ietf-avt-rtp-format-guidelines-04.txt) has been approved for publication as a BCP. At the time of writing, neither of these RFCs have been issued yet. The RTP MIB (draft-ietf-avt-rtp-mib-07.txt) has been sent to the IESG for consideration as a proposed standard. The draft on sampling group membership in RTP (draft-ietf-avt-rtpsample-05.txt) was published as an Experimental RFC after the meeting. In addition, the payload format for DTMF and related tones (draft-ietf-avt-tones-02.txt) was noted as having completed working group last call and will be submitted for publication after a final revision is posted. An update on the the RTP specification and audio/video profile was presented by Steve Casner. The recent changes to the RTP specification include: - Collision/loop detection algorithm nesting syntax changed to pseudo-C - Added SHOULD keep packets from new address on third-party collision for apps that may change addresses (e.g., where some endpoints are mobile) - Reverse reconsideration algorithm SHOULD be performed to possibly reduce the delay before sending first SR packet after starting to send data - Timing out a participant is based on inactivity for a number of RTCP report intervals calculated using the receiver RTCP bw fraction even for active senders There are three open issues with this specification: - It currently says that sources which change their source transport address MUST change SSRC - is this inconsistent with "SHOULD follow new address"? It was agreed to change MUST to MAY. - Should the length of compound RTCP packets with information about N sources be divided by N for average? Analysis since the meeting indicates this is not a problem. - It is necessary to clarify that a mixer mixing N sources may use N times the single-source RTCP bandwidth. The recent changes to the audio/video profile include: - CN (comfort noise) static payload type changed back from 19 to 13 for consistency with ITU H.225.0 and VoIP Forum documents - RTP clock rate G722 changed back to 8000 even though sampling rate is 16000 (a note of explanation was added) - Note that MPEG audio RTP clock rate is always 90000 regardless of sampling rate Once again, there are a number of minor open issues: - Should the profile recommend the use of the H263-1998 payload format over the old payload format for H263? It was decided yes. - Did AIFF-C list of channels change from what is in RFC 1890? No, but the set used for DAT and DV video are different and so may require separate specification in a payload format. - Change title of Section 3 to "IANA Considerations". - Fix bungled edit of CN from 13 to 19 in Table 5. The question of how to deal with proposed enhancements to the comfort noise payload format to include spectral shape parameters was raised. This work is ongoing in the ITU-T SG16, but will take time to complete - we don't want to wait for this, before we advance RTP. There are a number of options: - Leave the current comfort noise payload format as is, and define a new payload format for the extension - Enhance this format, so if it is longer than 1 byte it includes extra spectral information - Remove this payload format from the profile, so as not to hold the profile up, but leave the payload type assigned (with a note being added to the profile that payload type 13 is reserved for a comfort noise payload format which will be specified later). The chairs proposed taking this latter option, but this raised a number of questions about the status of the revised comfort noise draft (which is in use in a number of products) and the ability of the profile to reference it. After some discussion, a hybrid of these options was chosen: the comfort noise payload format will be extracted from the audio/video profile and submitted as a proposed standard RFC, referenced by the profile. Steve Casner will do this as part of editing the profile. The ITU-T will extend this, in the same manner as any other payload format is extended, with the expectation that it will cycle at proposed standard. The MIME registration for the payload types defined in the audio/video profile has been updated (draft-ietf-avt-rtp-mime-01.txt). The changes include clarification of the mapping from MIME parameters to SDP "a=fmtp" attributes and the addition of the RED (redundant audio) payload and "type" parameter for MPEG. It was noted that the payload format for generic FEC, which is already in the hands of the RFC editor, did not include a MIME registration for the name "parityfec"; Jonathan Rosenberg volunteered to write a separate draft for this. This meeting marked the completion of working group last call on the set of drafts that have been prepared for advancement of RTP specification and A/V profile to Draft Standard status. The working group accepted the changes that had been made in the drafts posted for last call as well as the few changes proposed in response to comments received during the last call. With those changes, the working group agreed that the following drafts should be submitted for IESG Last Call: draft-ietf-avt-rtp-new-05.txt (Draft standard) draft-ietf-avt-profile-new-06.txt (Draft standard) draft-ietf-avt-rtp-mime-00.txt (Proposed standard) draft-ietf-avt-rtcp-bw-00.txt (Proposed standard) This will be done immediately. The need for interoperability statements from implementors was noted. Implementors, both academic and vendors, are strongly encouraged to study draft-ietf-avt-rtp-interop-02.txt and draft-ietf-avt-profile-interop-00.txt, and to submit a statement of interoperability to the group. Scott Bradner (the transport area co-director) noted that it is okay for a neutral third party to present the results, hiding the identity of the vendors if they do not wish to make this information available. The chairs will set up a web page, summarizing the results of interoperability tests conducted. The next topic was a potential new task for AVT: a unicast RTCP-based retransmission scheme, presented by Matthew Podolsky. The motivation for this work (draft-podolsky-avt-rtprx-00.txt) is that there are a number of proprietry retransmission schemes in use, hindering interoperability, and no standard currently exists. Why unicast? Because unicast retransmission schemes are widely used, and because they are well understood, unlike multicast retransmission schemes which are difficult to make work in a scalable and timely manner. Further, reliable multicast is outside the scope of this working group. The proposal is for a new RTCP packet type, the "multipurpose ACK", which can be used to ACK or NACK a packet with a particular sequence number and up to 16 surrounding packets. This is flexible, specifying a basic framework for requests which can be modified or extended by a protocol subtype field. A number of items were raised for discussion: - Is it necessary to include the SSRC in retransmission requests? This is good for compatibility with the other RTCP packet types and allows us to groups NACKs for multiple sessions in one request. Anders Klemets noted that this allows one to send multiple NACKs in a single packet, allowing one to NACK multiple unicast RTP sessions. This may be of use when working with an RTSP server, since the server can ensure that the SSRCs used in each session are unique. It is unclear how useful this is, in practise. - Is it best to include a protocol subtype field, or to rely on the RTCP packet type field and out-of-band signalling? The feeling of the group was that the RTCP packet type was the appropriate place to demultiplex, and there was no need to include an additional subtype. Future versions of this draft should specify behaviour profiles, for example ACK or NACK based retransmission. - Should the bit mask of other packets which are being ACKed/NACKed cover preceeding or succeeding packets? It was suggested to do as in the FEC format, with an offset from the first sequence number. - How does this work with silence suppression? How long must one wait for the next packet before deciding it is loss, rather than silence? It was noted that, in order to make this work, it is necessary to eliminate the current limitations on minimum time between RTCP feedback. This may have adverse effects of the scalability and congestion control of the protocol and probably requires the definition of a new RTP profile, to specify these timers and other changes that may be required. Mark Handley noted that there is an urgent need to deploy congestion controlled RTP to prevent problems in the wide area network. If the group is going to work on a new unicast RTP profile including retransmission, it would be advisable if we could also include congestion control. In a discussion after the AVT session, the transport area directors agreed that it was appropriate for AVT to undertake this work, but that congestion management must be included. Since the ECM working group won't have output right away, Mark Handley and Sally Floyd agreed to write an informational document that the work on retransmission can refer to. The remainder of the first day was spent in a discussion of various RTP payload formats. The first was an update on the payload format for Real Time Pointers was presented by Reha Civanlar. The new draft resolves the issues raised at the previous meeting, and also adds the ability to convey an indication of the pointer icon (the mapping from code-points to icons is to be specified out of band). This is now ready for working group last call. The RTP payload formats for DV audio (draft-ietf-avt-dv-audio-00.txt) and video (draft-ietf-avt-dv-video-01.txt) were presented by Katsushi Kobayashi. Changes relating to the audio format include: - addition of 20 and 24 bit linear sampling modes, in addition to 12 bit non-linear - change MIME encoding from audio/NL12 to audio/DAT12 and add a MIME registration template - remove pseudo-code, since it was obvious - define SDP fmtp parameter for analogue pre-emphasis; this may be required for L16 audio within DV as well, and the working group generally agreed that adding an fmtp parameter to the existing L16 payload format was acceptable The DV video payload has seen the addition of support for D-7 and D-9 "professional" formats, a simplification of the SDP description and the removal of IEEE1394 specific timestamp issues to an appendix, since the D-7 and D-9 formats suppose a different interface. The MIME registration has yet to be done. Two interoperable implementations exist (from the WIDE project and KTH). These drafts are believed ready for working group last call. The payload format for MPEG2 AAC was presented by Jim Snyder (standing in for Mathias Kretschmer who was ill). Changes have been made relating to - priority vectors - fragmentation/grouping - a type field for the repair information This payload format has not yet been implemented, but this is in progress (probably not be done by next meeting, but by the one after). The draft is not yet ready for last call, since the authors need to do more development and testing, and desire feedback from the group (especially relating to MPEG-4). Ross Finlayson presented an update to the more loss-tolerant payload format for MP3 audio (draft-ietf-avt-rtp-mp3-01.txt). This payload format is more robust to packet loss than RFC2250, but implies more knowledge of MP3. Recent additions to the draft include a specification of how to handle streams containing a mix of layer I or II frames with layer III (this is an uncommon situation) and pseudo-code to convert between "MP3 frames" and "ADU frames" as required by this format. A plea was made for the group to check the pseudo- code, since it's somewhat complex. More information, including a performance comparison with RFC2250, is available from http://www.live.com/rtp-mp3/. Dave Singer noted that this is a great idea, but doesn't go far enough: maybe it should include support for interleaving too? His group has demonstrated improved performance with interleaving, reusing the MPEG sync byte so no overhead is added (though doing this precludes mixing in layer I or II frames). The addition of interleaving will be considered before the draft is progressed. An extension to the payload format for MPEG-2 transport streams (draft-ietf-avt-rtp-mp2t-00.txt) was presented by Humphrey Liu. The issue here is that MP2T is a packet- based transport format which can contain multiple programs with an aggregate bandwidth which can be as high as 50-60Mbps (=40Kpps). The 90KHz RTP timestamp clock defined in RFC2250 does not have sufficient resolution (about two ticks per packet), so this draft recommends a 27MHz clock, as defined by MPEG-2. In fact, RTP with MP2T does not provide much value compared to UDP for single-program transport streams (at lower data rates for which 90kHz is appropriate), but the RTP timestamp does add value for multiple-program transport streams given sufficient resolution. Why is so much resolution needed? For faster clock recovery, for calculation of piece-wise CBR bitrate interpolating the timestamp for every MPEG packet multiplexed into the RTP-encapsulation transport stream, and in order to regenerate the exact timing for re-insertion of each packet into a native MPEG-2 transport stream. No change to the actual payload format in RFC2250 is required, just a change in the RTP timestamp clock rate. For applications which use SDP, this can be accomplished by specifying a clock rate of 27000000 in the a=rtpmap attribute. The payload for text conversation (draft-ietf-avt-rtp-text-02.txt) was presented by Ian Rytina. The issues resolved since Oslo include: - What type of redundancy should be used (FEC or RED)? Agreed to leave it open, but recommend RED. - Why do we need this format when IRC and telnet chat are available? Compatibility with existing products, better fit with SIP and H.323. - MIME registration? Done - Clock frequency? Irrelevant for text, but 1kHz now specified. - Order of redundant data? Must be inserted in age order with one redundant block from each transmitted packet, in order to correctly decode all cases. - Avoid double specification of fill character for missing data - Transmission during silence periods - send empty primary in the redundancy format for the last packet before silence. This payload is now ready for working group last call: the main issue is the ordering of redundant data, which is somewhat subtle and should be verified to be correct. Jonathan Rosenberg noticed that the MIME type is text/t140, but SDP doesn't have a text media (on the m= line). It may be necessary to include a section on SDP usage? Steve Casner noted that this should not be necessary because the RTP MIME registration specifies a mapping to SDP. Allison Mankin presented some new work on an RTP payload format for HDTV, motivated by the start of commercial HTDV broadcasts in the US and the proliferation of formats. The bandwidth of HDTV streams ranges from 176Mbps to 1.49Gbps uncompressed, and compresses down to around 20-40Mbps for broadcast. These media formats are under standardisation by the ATSC (http://www.atsc.org) and DVB (http://www.dvb.org) but with major differences in transmission technology (COFDM vs 8VSB) and payload (eg: DVB uses MPEG audio, ATSC uses AC-3 Dolby). The payload for compressed HDTV is mostly covered by RFC2250. The uncompressed format is related to BT.656 with the addition of 4.2.0 colour sub- sampling and a 1080 line mode (versus 525/625 lines for BT.656). One question is whether the BT.656 payload format can just be extended for HDTV, but here again the RTP timestamp resolution may be an issue. A payload is also needed for AC3 audio. This is an early "heads up" on the work, which is ongoing, and further drafts are expected in future. The major issue discussed on the second day was the transport of MPEG-4 media. Colin Perkins started this by summarising the design team meeting held earlier in the week. There are three proposals for discussion: - draft-ietf-avt-rtp-mpeg4-02.txt - draft-guillemot-genrtp-01.txt - draft-jnb-mpeg4av-rtp-00.txt It is also expected that there will be a fourth proposal for a packetization based on MPEG FlexMux. A number of open issues were identified with these proposals: - Use of MPEG-4 systems vs elementary streams. - MPEG-4 has a complete system model, but some applications just desire to use the codecs. We accept that we need to generate payload formats for both cases, even though this has the potential for interoperability problems later (compare issues with MPEG-2). - We also noted that MPEG-4 encompasses codecs which may have an existing RTP payload format, but we can't preclude the use of MPEG-4 specific packetization if the system model is to be maintained. In addition, for error resilience, it is desirable to packetize in a media aware manner - this does not imply a choice between systems or elementary streams. - Multiplexing multiple streams. o Five multiplexing options were identified: GeRM, FlexMux, "PPP Mux" with CRTP (see later), don't multiplex, and don't multiplex, but compress headers. We noted that we may need a FlexMux format, but nothing else requires special consideration. o Grouping within a stream. o Why group? to amortize header overhead and to aid error resilience (for example duplicate and group picture headers with each packet, or group FEC with media data). o There is disagreement over the importance of grouping, and the mechanism to be used: draft-guillemot-genrtp-01.txt has support for grouping access units (ie: ADUs) and for sub-access units (eg: unequal FEC), the other proposals perform no additional grouping assuming anything needed will be done in the encoder. o Further experimental testing is needed to determine whether additional error resilience is worth the extra complexity. - Fragmentation o It is necessary to fragment a codec bit-stream in a media aware manner to achieve error resilience o We believe that the choice of fragmentation is a matter for the encoder, and that MPEG-4 should be able to fragment accordingly (a payload format should be able to transport oversize fragments, but this need not be efficient). - Error protection o We considered two forms of error protection: within the payload and across packets. o draft-guillemot-genrtp-01.txt uses unequal FEC within the payload (using a "typed segment" abstraction to be generic - this abstraction doesn't exist in MPEG, although the information is available in an ES specific manner). The other proposals assume the MPEG bitstream is robust enough "as is". Further experimental testing is needed. o We may also apply any of the existing error protection mechanisms across packets (parity FEC, for example). Draft-guillemot-genrtp-01.txt also duplicates these functions as part of its unequal FEC scheme. o Some MPEG-4 elementary streams MUST be reliably delivered (eg: control streams and streaming media such as Java class files). We reached no conclusion on how we do this - have focused on audio and video to date. - Timing model. o MPEG-4 and RTP have different timing models. If it is desired to synchronize MPEG-4 data with data using a native RTP packetization we must align the models (capture time vs composition time). We believe the text in draft-ietf-avt-rtp-mpeg4-02.txt describing this alignment is correct, but needs edits for clarity. This is an issue for all the proposals. - Byte alignment. o MPEG-4 systems has the potential to handle not byte aligned bit-streams (audio and video codecs do not but others may?), but the streams it produces are byte aligned. The ES packetizations may have to similarly include the capability to add padding bits. Following this summary, brief presentations were made on the individual proposals. - Andrea Basso presented draft-ietf-avt-rtp-mpeg4-02.txt, noting that text has been added describing the correspondance between the RTP and MPEG-4 timing models (this actually affects all the proposals). Implementations of this payload format exist by AT&T and Nokia, and interoperability results were presented to the recent MPEG meeting in Melbourne. - Paul Christ presented an overview of draft-guillemot-genrtp-01.txt and the motivations for this approach. The draft has not been changed since the Oslo meeting, but there is a plan to clean up the format and resubmit before Adelaide. An implmentation is underway - the mapping and de-mapping is done and under test, the unequal error protection is under development (see http://www-ks.rus.uni-stuttgart.de/PROJ/GP) for details. - Yoshihiro Kikuchi presented draft-jnb-mpeg4av-rtp-00.txt. This is a new draft, which comprises a direct mapping of elementary streams to RTP, for applications like IP phone, mobile, and H.323. The MPEG-4 systems model is not used, session/stream management is done via H.245 or SIP. A standards track payload format is needed, for ITU interaction, since an H.323 audio codepoint will be specified in February 2000. MPEG-4 video is mapped directly, audio is mapped via the LATM encapsulation. The draft also specifies back-channel signaling using RTCP. The discussion of MPEG-4 was concluded by Colin Perkins, who outlined the chairs' view on future activity in this area. - It was noted that the MPEG-4 audio and video codec elementary streams can be packetized in RTP in a manner similar to other codecs. This path is well understood and makes sense to standardize. Standardization is also required for these payload formats to be referenced by the ITU. Hence we propose to adopt draft-jnb-mpeg4av-rtp-00.txt as an AVT work item, for eventual submission on the standards track. - Multiplexed MPEG-4 media is to be treated in a similar manner to earlier bundled MPEG transport. We will therefore consider a FlexMux payload format, if one is submitted. - We do not believe we fully understand the issues involved in the transport of the complete MPEG-4 system over RTP. Such payload formats should therefore be submitted for publication as experimental RFCs, whilst we gain implementation experience. Accordingly, draft-ietf-avt-rtp-mpeg4-02.txt and draft-guillemot-genrtp-01.txt will be progressed to experimental RFC status. The chairs asked for - and received - consensus that this is a viable approach. An a postscript to the MPEG-4 discussion, Dave Singer provided an overview of draft-singer-rtp-qtfile-01.txt, the QuickTime file format and hint track specification. This was presented for information and comments, it is not intended to be published on the standards track. The next subject was RTP multiplexing. The first presentation in this area was by Bruce Thompson on extensions to CRTP and RTP multiplexing using tunnels. The original work has was draft-wing-avt-tcrtp-00 which was submitted to the Oslo meeting. This was been broken into three parts: IP tunneling, PPP multiplexing, and enhancements to CRTP. Three independent enhancements to CRTP are described in draft-koren-avt-crtp-enhance-00.txt. The first is the ability to include a delta-timestamp in COMPRESSED_UDP packets to restore the state at the decompressor in the event of packet loss. This makes CRTP more robust to packet loss and also lets you refresh state and send it multiple times efficiently without the delay of error feedback. Concern was expressed about the need to send the absolute value of IPv4 ID, which could be addressed by using the COMPRESSED_NON_TCP packet instead except that doesn't include the delta-T. The second extension is to add a NonRTP stream flag to the FULL_HEADER packet to notify the decompressor that the compressor will never send COMPRESSED_RTP packets for a particular context so that the decompressor does not have to maintain context for the RTP header. The third extension is to define a reject packet to allow a decompressor to reject the use of a new compression context when it is out of resources. An overview of how this enhanced CRTP can be used for tunneling to provide a transparent and efficient multiplexing solution was also provided. RTP packets are encapsulated end-to-end for multiplexing, consisting of - compression (CRTP, with enhancements for robustness to loss) - multiplexing (PPP multiplexing, draft-ietf-pppext-pppmux-00.txt, which is ongoing in the PPPEXT working group). - IP tunneling for PPP (L2TP, draft-ietf-pppext-l2tphc-03.txt, not posted) which removes session ID, tunnel ID from L2TP, and the UDP header for efficiency, using a negotiated IP protocol ID. - CRTP negotiation (RFC2509) The application runs RTP with no knowledge of the tunnel/compression: CRTP and muxing take place at layer 2, independent of the application. It was noted that it is also possible to compress away the IP header on a per link basis, if desired. An informational document is to be produced, describing how these pieces fit together to provide the complete solution. Steve Casner noted that this approach to the entire muxing problem is clean and moves the multiplexing down to the IP layer where it is more appropriate, keeping the RTP semantics tidy. Input from the group was solicited on the changes for CRTP. The final presentation was by Colin Perkins, for Alexander Tulai who was unable to attend, on a dynamic Nx64 payload format (draft-tulai-avt-dynamic-nx64-00.txt), for multiplexing PCM coded audio streams (eg: transport of a T1 line). The consensus of the group was that this draft is too narrowly focused, but a payload format for general circuit emulation may make sense. The meeting concluded with a discussion of the working group charter. The main goal of the group has been the development of RTP; once this is done we have concluded most of our work. We also have steps in the charter for MPEG-4 and multiplexing, but we expect to complete these and our other remaining actions over the course of the next couple of meeting. This will leave only the continuing evolution of payload formats - these don't necessarily need AVT to meet everytime, since much of this work can be done on the mailing list. There are a number of options for the future of the group: one possibility is that we may go into hibernation, meeting occasionally when there is sufficient work. It was suggested that we could separate the RTP protocol itself from the payload question: payloads don't necessarily have to be done in the RTP working group (although AVT - or its successor - may review such payload formats), and we could recharter AVT to do extensions of the core RTP protocol/profile only.Another option would be to recharter to study new profiles for unicast RTP retransmission and/or congestion control. Input from the working group is solicited.