Minutes of the Audio/Video Transport Working Group Reported by Colin Perkins. The audio/video transport working group met twice at the 43rd IETF in Orlando. The main agenda items were discussion of RTP multiplexing, and the update of RTP and the audio/video profile for advancement to draft standard status. In addition, a number of RTP payload formats and the RTP MIB were discussed. The meeting opened with a review of the working group status, and documents completed since the last meeting. These include the payload formats for H.263+ video (RFC2429), BT.656 video (RFC2431) and JPEG video (RFC2435), together with AT&T's Error Resilient Video Transmission Technique (RFC2448). The RTP MIB (draft-ietf-avt-rtp-mib-03.txt) was presented by Mark Baugher. Changes since the previous draft have been motivated by comments and review by Bert Wijnen and the ITU SG16. The main differences are as follows: renamed MIB definitions to RTP-MIB and module to RTPMIB; changed OID structure to current IETF conventions; several clarifications made to DESCRIPTION clauses; use "noSuchInstance" rather than "noSuchObject" in rtpRcvrRTT; changed media counters to be 64 bits; separated host and monitor compliance sections; and the use InterfaceIndexOrZero instead of redefining InterfaceIndex. There are a number of open issues with the use of multiple MIB agents in switches, the optional nature of the interface index and to permit easier selection of session entries by session address which have to be resolved, but it is expected that this will be complete within the next few weeks, at which time a new draft will be issued ready for working group last call. A reference implementation is available of the -02 draft, work is progressing on the latest version. The RTP MIB will be referenced by H.341, the ITU's H-series management specification. The RTP payload format for DTMF and other telephone tones (draft-ietf-avt-dtmf-01.txt, draft-ietf-avt-tones-00.txt) was presented by Scott Petrack. The justification for using a new payload format is to reproduce the tones better than low-rate codecs can do; to apply redundancy differently for tones and speech; to separate detection of tones from their interpretation; and to preserve the sounds associated with particular signals. Two payload formats are defined: a named signal event payload and a tone frequency format payload. The named signal event payload includes a named event (dialtone, busy, call-waiting, off-hook, etc), together with the volume and duration of that event. The tone frequency format sends a representation of the actual set of tones to be played, rather than their purpose (name), enabling a wider range of tones and the playout of foreign tones. Both formats may be conveyed in a single packet, using RFC2198 redundancy, for robustness. Mark Handley expressed some concern with the means by which this is done: in particular it is unclear how to playout these tones when a fraction of the tone is lost. The two drafts are to be merged within the next couple of months, and it is the desire of the authors to move the result to the standards track soon after. Some concern was expressed that this draft is not receiving sufficient exposure in bodies such as the ITU which may be able to provide relevant feedback - it must be ensured that these bodies have a chance to comment before this document can progress in the IETF. The discussion of RTP multiplexing started with a presentation by Jonathan Rosenberg on issues in RTP multiplexing (draft-ietf-avt-muxissues-00.txt). As noted in this draft, there are two scenarios which are of interest: end-to-end and mid-network multiplexing. The scenario chosen will affect the choices made in designing the multiplexing scheme. These choices affect the delineation, identification, synchronisation and dynamism of the multiplexed data. There are three means by which data in a multiplexed packet may be delineated: by explicit length indicators (which have maximum flexibility but the largest overhead); implicitly based on payload type (many codecs are fixed length blocks); or implicitly by out of band signalling (which has the least bandwidth overhead, but requires signalling and makes changing encodings or packetisation difficult). If all multiplexed packets are of the same duration this problem becomes much simpler. Multiplexed data may be identified by an explicit ID, where the number of bits used depends on the number of simultaneous calls and the desired ID reuse latency. This ID may be the RTP SSRC in some cases. Alternatively the data may be channelised, where each user's data gets a slot in the packet. This latter approach requires out-of-band signalling and, of silence suppression is used, a bitmask to indicate which channels are active. The overheads of the two schemes vary with the number of channels being multiplexed, whether channels are active continually, and with the rate of change of the set of multiplexed streams. If each frame of multiplexed data within a packet has the same timestamp, the individual timestamps may be elided, and replaced by the single timestamp in the multiplexed packet's RTP header. If the timestamps are close, offsets relative to the outer timestamp may be used rather than complete timestamps (saving some number of bits). Alternatively if users have uncorrelated frame start times it is necessary to send the complete timestamp per user. The dynamism of codecs also affects the multiplexing format chosen. If codecs never change there is no need to include a payload type indication within the multiplexed stream, and out of band signalling may be used. Similarly, as codecs change rarely, out of band signalling may still be appropriate, depending on how often codecs may change (synchronisation between the signalling of a payload type change and the media stream may be complex). If codecs change frequently, some form of in-band payload type indication is most appropriate - this need not necessarily be a complete RTP PT value if the set of allowable codecs is small a mapping table may be used instead. The marker bit may need to be transmitted depending on the use of the multiplexed stream. It was noted that this is a very large space, and a number of solutions to the multiplexing problem are possible. This group has a number of solutions presented to it, yet the precise problem definition for each of these has not been enunciated. It may help to focus the discussion if the question "why are you multiplexing?" is clearly answered, and if we derive a number of scenarios which require common solutions. Essentially, we need to focus on requirements. Trying to do generic optimisation using a multiplexer is futile. The first multiplexing proposal (an update to draft-ietf-avt-mux-rtp-00.txt) was presented by Barani Subbiah. The stated goals of this proposal are to achieve the best possible fit with cellular/PSTN applications and to derive a payload format suitable for use in a switched IP telephone network. They are aiming for a simple format with a fixed header suitable for hardware implementation, providing a compromise between bandwidth saving (in addition to the outer RTP header, this proposal averages two bytes overhead per multiplexed stream) and complexity. The sequence number (2 bits) is new since the last draft, this means a reduction in the length field size to 5 bits. Concern was expressed that a 5 bit length field is insufficient for some audio codecs which may be desirable. The use of the 2 bit sequence number was questioned, since 4 packet losses are possible -- a longer field should probably be used. A transition bit is included to signal a change in the end-to-end flow parameters, allowing one state change per RTT. It was noted that if all packets with an RTT are lost, the state change will not be noted at the receiver. It was also noted that payloads such as DTMF interspersed with voice can cause change in payload type more often than once per RTT. Tohru Hoshi presented draft-tanigawa-rtp-multiplex-01.txt. This is a simple proposal where multiple RTP packets are concatenated for transmission in a single UDP packet. The default packetisation interval specified for a codec in the audio/video profile is used such that no length indication is necessary (or can be signalled out-of-band, if a non-default interval is desired). The draft now includes a section describing the efficiency gains for using this proposal according to various metrics. Call setup signalling is also defined. The next multiplexing proposal discussed was GeRM (draft-ietf-avt-germ-00.txt), presented by Mark Handley. The goal of this proposal is to transparently multiplex a number of RTP streams. It operates using difference coding between the headers of packets to be multiplexed together. Clearly this will work better if the packet headers are similar (this can be achieved between cooperating gateways, although the traffic pattern will affect performance) but it will still work if the end points do not cooperate, and will perform no worse than simply concatenating packets. The GeRM protocol is well suited for scenarios where a mix of RTP packets are to be multiplexed, such as may be encountered in the transport of MPEG-4 streams, or for use between a pair of cooperating gateways multiplexing a large number of similar streams. It achieves considerable flexibility, at the expense of complex parsing and greater bandwidth overhead than other, less general, protocols. The final multiplexing presentation was produced by Dean Willis during the meeting, so no draft exists. The assumptions here are that large numbers of streams are being carried between end-point pairs, fast interfaces with minimal serialisation delay are used, and mixed codecs with silence suppression exist. The goal is to increase overall "network efficiency" by re-packing packets to increase the total MTU and reduce the number of packets sent. Two alternatives were considered: - RTP level: complex, no benefit to non-RTP traffic, issues with RTCP - UDP level: brute force level simplicity, aggregate UDP flows between end-points, allows IPsec at multiplexing level, transparent to the application, allows mid-network multiplexing, less efficient The UDP level approach is favoured. Following the presentations a considerable amount of discussion ensued. Concern was expressed that multiplexing is being used as an "RTP switching" solution, with application level routing: it was noted that IP has a number of perfectly reasonable routing algorithms already, and it is unnecessary to re-invent these within RTP. Many people expressed concern that the problem to be solved by multiplexing has not been clearly stated: is it to reduce the number of packets sent? the number of bytes sent? to perform application level RTP routing? etc. It is unclear that RTP multiplexing is the correct solution here: a generic UDP multiplexing protocol (as in the final proposal) may be more suitable in some cases. Carsten Bormann succinctly stated that if we are to define an RTP multiplexing scheme, it should be an absolute requirement to preserve the integrity of the RTP information. If it does not, then it is not RTP multiplexing, it is a new protocol. Concern was expressed that multiplexing streams with different transport level addressing into one is not clearly handled by these proposals. In some cases, the SSRC is assumed to provide a unique stream ID, which is not necessarily the case across multiple streams. The handling of RTCP data by a number of these proposals is also unclear. The proposals need to be extended to address these issues. Some of the proposals specify a particular form of signalling in addition to the payload format. These payload formats should be independent of the signalling to be used. The proposals may want to express signalling requirements, but should not tie the payload format to a specific scheme. Since it is unclear whether a single protocol can satisfy multiple aims, and which of the five proposals currently submitted to the group will go forward, the authors and other working group participants are requested to submit application scenarios in which multiplexing is to be applied. Within those scenarios, assumptions about traffic can be made explicit. We'll choose three or four scenarios and ask the authors to simulation or analysis to quantify the performance of their proposals under those scenarios, in order to facilitate a fair comparison. The next subject was the transport of MPEG-4 streams within RTP. A number of AVT members participated in the MPEG meeting in Atlantic City in October, leading to the formation of an ad-hoc group within MPEG to discuss MPEG transport using IP. The work conducted in that group to date was presented by Reha Civanlar. A number of alternatives for the transport of MPEG-4 streams in an IP network were considered: - directly on UDP - RTP followed by a full MPEG-4 SyncLayer packet - MPEG-4 SyncLayer packets mapped onto RTP packets - MPEG-4 elementary streams over RTP with natural payload formats The preferred approach would be to use the latter approach, but since the ES interface is not a normative part of MPEG-4 this may not be feasible. The approach chosen is, therefore, to map the MPEG-4 SyncLayer packets onto RTP packets, such that the common pieces of the header reside in the RTP header, with a small payload header providing the MPEG-4 specific features. A single payload format is used for MPEG-4 streams transported within RTP, and the MPEG-4 model is maintained (although not the precise packet format). In this approach, an RTP multiplexing scheme is needed to fulfill the role of FlexMux in MPEG-4. The GeRM proposal seems to be a good fit for this. An internet draft detailing this work is in preparation. Those who wish to participate in this work are encouraged to join the ad-hoc group's mailing list: send email to 4onIP-sys-request@fzi.de in order to subscribe. Christine Guillemot presented an RTP generic payload with scalable and flexible error recovery (draft-guillemot-genrtp-00.txt). This draft takes a somewhat different view of the problem of transporting MPEG-4 content and is based on carrying elementary streams in a generic manner. The motivation for this is to transport many types of stream whilst avoiding having to define a payload format for each and allowing finer control of error correction with a set of different FEC mechanisms and the possibility of grouping AUs in a single packet. One of the aims of this work are to factorize the common features instead of developing specific formats for each codec/type of elementary stream and to be able to identify repeated data so that the network adaptive layer can identify and remove this if desired. The adaptation layer can add FEC to entire packets or to portions of a stream within packets (adding redundancy in a similar manner to RFC2198). Concern was expressed that adding FEC to portions of a packet adds a lot of extra complexity, and unless this FEC is much smaller than that which would otherwise be present this complexity may not be justified. Some concern was expressed that this document includes a number of payload formats (redundancy, FEC, fragmentation and grouping) which may be better separated. This clearly depends on the details of the stream which is being packetised. It is unclear that this format is suitable as a generic RTP payload independent of MPEG-4, however it may work well as a general purpose transport for MPEG-4 elementary streams. Steve Casner described the changes made to the main RTP specification since the last meeting. This is now stable, and unless major problems are found is believed ready for last-call for draft standard. The changes since the last meeting are described in draft-ietf-avt-rtp-new-02.txt and include: - SSRC sampling moved to separate draft (ietf-avt-rtpsample-01.txt) - Keep only unconditional reconsideration - Add IANA considerations section added; no longer suggest experimental registration of values - Y2036 (in)consequences explained - convert to MUST, SHOULD, MAY A plea was made for help checking: - Section 0: resolution of open issues - Section 6.2: RTCP transmission interval - Section 6.3: RTCP send and receive rules - Appendix A: does the code work? - Appendix B: changes from rfc1889 It was noted that the group must document "at least two independent and inter-operable implementations from different code bases" of "all of the options and features of the specification" in order to advance to draft standard status (RFC2026). Colin Perkins volunteered to produce a draft detailing those options and features as a checklist for vendors to check compliance, and Jonathan Rosenberg volunteered to produce a draft detailing tests for the timer reconsideration algorithms. Since the meeting, Jonathan has done a careful check of the code in Appendix A and found several problems to be fixed. The changes to the RTP profile (draft-ietf-avt-profile-new-04.txt) since the last meeting are less advanced: a clearer statement of the new policy of no more static assignments, and the addition of change bars. It is still necessary to complete the update with MUST, SHOULD, MAY, etc and to add text to allow default of 5% RTP bandwidth to be overridden. The registration of RTP payload format names as MIME types is still not complete: Philipp Hoschka volunteered to work on this, and to work out the details of the process. It is hoped that this may just be a statement we can put in the profile to specify how the registration is done, without changing the MIME registration process, but this is not yet clear. The working group has agreed in previous IETF meetings that any additional RTCP SDES items should be defined in separate RFCs rather than adding them to the base RTP spec. This is in part to minimize changes at the transition from Proposed to Draft Standard but also because we did not want implementors to infer that all applications should include all the SDES items. Accordingly, Peter Parnes has written draft-parnes-rtp-sdes-00.txt to propose the addition of new SDES items Nickname, Homepage, Personal_image and Active_media. Steve Casner presented this proposal since the author could not attend. This proposal is similar to the set of potential new SDES items discussed at the 41st IETF in Los Angeles, though Organisation was included previously and Active_media is new here. Comments are requested from the group as to whether this is the right set of new SDES items to define at this point, or whether some others should be added or some of these deleted. The status of this RFC would be Experimental rather than Proposed Standard. Steve Casner repeated a concern expressed previously that about the inclusion of URLs in an SDES packet which may be sent to a large multicast group, since simultaneous retrieval of these by many receivers can cause implosion problems. This draft specifies that retrieval should either be done only in response to direct user action or if automated should be delayed by a random interval (after receipt of the RTCP packet). Is this specification sufficient? Mark Handley asserted that we shouldn't try to turn RTCP into a general data transfer mechanism, but did favor adding Organization since current practice is to include that information in the Name item. The consensus of the group was that it was reasonable to make extensions to RTCP, but the additional information must be optional, and should not be required for operation of the application. Extension RFCs should include an applicability statement for each item. Further comments on this draft will be requested on the mailing list to establish consensus for proceeding. The SSRC sampling algorithms (draft-ietf-avt-rtpsample-01.txt) were presented by Jonathan Rosenberg. These have been moved out of the main RTP specification because of the IPR issues (Lucent patent on the binning algorithm). The changes to this document are: - Uniformity of SSRC values usage: recommend hashing SSRC value, because some broken implementations doesn't choose uniformly distributed SSRC values - New section on performance of sampling in terms of coefficient variation added - An explicit statement of the IPR issues and licensing terms needs to be added Comments are requested. The document giving guidelines for writers of RTP payload formats (draft-ietf-avt-rtp-format-guidelines-01.txt) was noted as being essentially complete. One more revision will be produced to include some comments recently received, so others wishing to make comments should do so as soon as possible. That revision will be last called for BCP status. The payload format for PureVoice(TM) audio (draft-mckay-qcelp-01.txt) was noted as being in working group last call still, pending resolution of a question regarding the proposed scheme for encryption of the payload data in a non-standard manner. A compromise solution has been worked out in offline discussions with the authors, so a revision of the draft is expected soon so last call can be completed if there are no further objections. The Generic FEC payload (draft-ietf-avt-fec-04.txt) was presented by Jonathan Rosenberg. Changes include the removal of Reed-Solomon coding (to a separate draft) and mask extension. The examples and code have been tested and bug-fixed, and the issues with encryption during key changes have been resolved. This document is essentially ready for last call as proposed standard - will do one more revision to get the MUST, SHOULDs, etc sorted. Comments are sought. The Reed-Solomon draft (draft-ietf-avt-reedsolomon-00.txt) was also presented by Jonathan Rosenberg. Help is required with this: if anyone having expertise on the Reed-Solomon algorithm is interested in seeing this work progress they should contact Jonathan, else the draft will not be updated. A proposal to use the RFC2198 redundancy format as a transport for interleaved audio (draft-ietf-avt-interleaving-00.txt) was presented by Colin Perkins. This may eventually be merged into that document, although this is at present undecided. Comments are sought. The meeting concluded with a brief presentation on the proposed charter revision. It was noted that this needs clarification regarding MPEG-4 payload formats, but is otherwise satisfactory. The revised charter will be sent for IESG approval in the near future.