rfc8845.original.v2v3.xml   rfc8845.form.xml 
<?xml version='1.0' encoding='utf-8'?> <?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent"> <!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" ca <rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF"
tegory="std" consensus="yes" number="XXXX" obsoletes="" updates="" xml:l category="std" consensus="yes" number="0000" obsoletes="" updates="
ang="en" sortRefs="true" symRefs="true" tocInclude="true" version="3"> "
xml:lang="en" sortRefs="true" symRefs="true" tocInclude="true"
version="3" ipr="trust200902" docName="draft-ietf-clue-framework-25
">
<!-- xml2rfc v2v3 conversion 2.45.2 --> <!-- xml2rfc v2v3 conversion 2.45.2 -->
<front> <front>
<title abbrev="CLUE Framework">Framework for Telepresence Multi-Stre ams</title> <title abbrev="CLUE Framework">Framework for Telepresence Multi-Stre ams</title>
<seriesInfo name="RFC" value="XXXX"/> <seriesInfo name="RFC" value="0000"/>
<!--[rfced] As the short title in the running header for this docume
nt was "CLUE Telepresence Framework", might the document title be better
as:
Controlling Multiple Streams for Telepresence (CLUE) Framework <!--[rfced] As the short title in the running header for this document w
as
"CLUE Telepresence Framework", might the document title be better as fol
lows?
Original:
Framework for Telepresence Multi-Streams
Perhaps:
Controlling Multiple Streams for Telepresence (CLUE) Framework
--> -->
<author fullname="Mark Duckworth" initials="M." role="editor" sur name="Duckworth"> <author fullname="Mark Duckworth" initials="M." role="editor" sur name="Duckworth">
<organization>Polycom</organization> <organization/>
<address> <address>
<postal> <postal>
<street>Andover, MA 01810</street> <city>Andover</city><region>MA</region><code>01810</code>
<street>United States of America</street> <country>United States of America</country>
</postal> </postal>
<email>mark.duckworth@polycom.com</email> <email>mrducky73@outlook.com</email>
</address> </address>
</author> </author>
<!-- [rfced] We have received a bounce menssage for
mark.duckworth@polycom.com. Please let us know how Mark's contact infor
mation
should be updated.
Original:
Mark Duckworth (editor)
Polycom
Andover, MA 01810
USA
Email: mark.duckworth@polycom.com
<!-- [rfced] Mark, we have updated your email address as requested.
Please let us know if there are any other updates to your contact
information.
--> -->
<author fullname="Andrew Pepperell" initials="A." surname="Pepper ell"> <author fullname="Andrew Pepperell" initials="A." surname="Pepper ell">
<organization>Acano</organization> <organization>Acano</organization>
<address> <address>
<postal> <postal>
<street>Uxbridge, England</street> <city>Uxbridge</city>
<street>United Kingdom</street> <country>United Kingdom</country>
</postal> </postal>
<email>apeppere@gmail.com</email> <email>apeppere@gmail.com</email>
</address> </address>
</author> </author>
<author fullname="Stephan Wenger" initials="S." surname="Wenger"> <author fullname="Stephan Wenger" initials="S." surname="Wenger">
<organization abbrev="Vidyo">Vidyo, Inc.</organization> <organization abbrev="Vidyo">Vidyo, Inc.</organization>
<address> <address>
<postal> <postal>
<street>433 Hackensack Ave.</street> <street>433 Hackensack Ave.</street>
<street>Hackensack, N.J. 07601</street> <city>Hackensack</city><region>NJ</region><code>07601</code>
<street>United States of America</street> <country>United States of America</country>
</postal> </postal>
<email>stewe@stewe.org</email> <email>stewe@stewe.org</email>
</address> </address>
</author> </author>
<date month="November" year="2017"/> <date month="June" year="2020"/>
<workgroup>CLUE WG</workgroup> <area>ART</area>
<workgroup>CLUE</workgroup>
<!-- [rfced] Please insert any keywords (beyond those that appear in <!-- [rfced] Please insert any keywords (beyond those that appear in
the title) for use on https://www.rfc-editor.org/search. the title) for use on https://www.rfc-editor.org/search.
--> -->
<keyword>example</keyword> <keyword>example</keyword>
<abstract> <abstract>
<t> <t>
This document defines a framework for a protocol to enable devices This document defines a framework for a protocol to enable devices
in a telepresence conference to interoperate. The protocol enables in a telepresence conference to interoperate. The protocol enables
communication of information about multiple media streams so a communication of information about multiple media streams so a
sending system and receiving system can make reasonable decisions sending system and receiving system can make reasonable decisions
about transmitting, selecting, and rendering the media streams. about transmitting, selecting, and rendering the media streams.
This protocol is used in addition to SIP signaling and Session Descri ption Protocol (SDP) This protocol is used in addition to SIP signaling and Session Descri ption Protocol (SDP)
negotiation for setting up a telepresence session.</t> negotiation for setting up a telepresence session.</t>
</abstract> </abstract>
</front> </front>
<middle> <middle>
<section anchor="section-1" numbered="true" toc="default"> <section anchor="s-1" numbered="true" toc="default">
<name>Introduction</name> <name>Introduction</name>
<t> <t>
Current telepresence systems, though based on open standards such Current telepresence systems, though based on open standards such
as RTP <xref target="RFC3550" format="default"/> and SIP <xref target ="RFC3261" format="default"/>, cannot easily interoperate with as RTP <xref target="RFC3550" format="default"/> and SIP <xref target ="RFC3261" format="default"/>, cannot easily interoperate with
each other. A major factor limiting the interoperability of each other. A major factor limiting the interoperability of
telepresence systems is the lack of a standardized way to describe telepresence systems is the lack of a standardized way to describe
and negotiate the use of multiple audio and video streams and negotiate the use of multiple audio and video streams
comprising the media flows. This document provides a framework for comprising the media flows. This document provides a framework for
protocols to enable interoperability by handling multiple streams protocols to enable interoperability by handling multiple streams
in a standardized way. The framework is intended to support the in a standardized way. The framework is intended to support the
skipping to change at line 103 skipping to change at line 104
The basic session setup for the use cases is based on SIP <xref targe t="RFC3261" format="default"/> The basic session setup for the use cases is based on SIP <xref targe t="RFC3261" format="default"/>
and SDP offer/answer <xref target="RFC3264" format="default"/>. In a ddition to basic SIP &amp; SDP and SDP offer/answer <xref target="RFC3264" format="default"/>. In a ddition to basic SIP &amp; SDP
offer/answer, signaling that is ControLling mUltiple streams for offer/answer, signaling that is ControLling mUltiple streams for
tElepresence (CLUE) specific is required to exchange the tElepresence (CLUE) specific is required to exchange the
information describing the multiple media streams. The motivation information describing the multiple media streams. The motivation
for this framework, an overview of the signaling, and the information for this framework, an overview of the signaling, and the information
required to be exchanged are described in subsequent sections of required to be exchanged are described in subsequent sections of
this document. Companion documents describe the signaling details this document. Companion documents describe the signaling details
<xref target="RFCYYY3" format="default"/>, the data model <xref targe t="RFCYYY1" format="default"/>, and the protocol <xref target="RFCYYY2" format="default"/>.</t> <xref target="RFCYYY3" format="default"/>, the data model <xref targe t="RFCYYY1" format="default"/>, and the protocol <xref target="RFCYYY2" format="default"/>.</t>
</section> </section>
<section anchor="section-2" numbered="true" toc="default"> <section anchor="s-2" numbered="true" toc="default">
<name>Terminology</name> <name>Terminology</name>
<section anchor="section-2.1" numbered="true" toc="default"> <section anchor="s-2.1" numbered="true" toc="default">
<name>Requirements Language</name> <name>Requirements Language</name>
<t> <t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as "MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP&nbsp;14 <xref target="RFC2119" format="default"/> < xref target="RFC8174" format="default"/> described in BCP&nbsp;14 <xref target="RFC2119" format="default"/> < xref target="RFC8174" format="default"/>
when, and only when, they appear in all capitals, as shown here. when, and only when, they appear in all capitals, as shown here.
</t> </t>
</section> </section>
<section anchor="section-2.2" numbered="true" toc="default"> <section anchor="s-2.2" numbered="true" toc="default">
<name>Definitions</name> <name>Definitions</name>
<t> <t>
This document occasionally refers to the term "CLUE". CLUE is an acro nym for "ControLling mUltiple streams for This document occasionally refers to the term "CLUE". CLUE is an acro nym for "ControLling mUltiple streams for
tElepresence", which is the name of the IETF working group in which tElepresence", which is the name of the IETF working group in which
this document and certain companion documents have been developed. this document and certain companion documents have been developed.
Often, CLUE-* refers to something that has been designed by Often, CLUE-* refers to something that has been designed by
the CLUE working group; for example, this document may be called the CLUE working group; for example, this document may be called
the CLUE-framework. the CLUE-framework.
</t> </t>
<t> <t>
The terms defined below are used throughout this document and The terms defined below are used throughout this document and
in companion documents. Capitalization is used in order to easily id entify a defined term.</t> in companion documents. Capitalization is used in order to easily id entify a defined term.</t>
<!--[rfced] We had the following questions about the "Definition
s" section: <!--[rfced] We had the following questions about the "Definitions"
section:
a) We see the following text: a) We see the following text:
"Capitalization is used in order to easily identify a defined term." "Capitalization is used in order to easily identify a defined term."
However, we see a number of cases throughout where a term from the However, we see a number of cases throughout where a term from the
"Definitions" section (e.g., media stream) is used in lowercase form. "Definitions" section (e.g., media stream) is used in lowercase form.
As this is true for nearly every term in the list, we ask that you As this is true for nearly every term in the list, we ask that you
please review the use of capitalization of each of these terms please review the use of capitalization of each of these terms
throughout the document and let us know if/how we may make them throughout the document and let us know if/how we may make them
skipping to change at line 306 skipping to change at line 308
<dt>Video Capture (VC):</dt> <dt>Video Capture (VC):</dt>
<dd>Media Capture for video. Denoted as VCn in the <dd>Media Capture for video. Denoted as VCn in the
example cases in this document.</dd> example cases in this document.</dd>
<dt>Video Composite:</dt> <dt>Video Composite:</dt>
<dd>A single image that is formed, normally by an RTP <dd>A single image that is formed, normally by an RTP
mixer inside an MCU, by combining visual elements from separate mixer inside an MCU, by combining visual elements from separate
sources.</dd> sources.</dd>
</dl> </dl>
</section> </section>
</section> </section>
<section anchor="section-4" numbered="true" toc="default"> <section anchor="s-4" numbered="true" toc="default">
<name>Overview and Motivation</name> <name>Overview and Motivation</name>
<t> <t>
This section provides an overview of the functional elements This section provides an overview of the functional elements
defined in this document to represent a telepresence or defined in this document to represent a telepresence or
multistream system. The motivations for the framework described multistream system. The motivations for the framework described
in this document are also provided.</t> in this document are also provided.</t>
<t> <t>
Two key concepts introduced in this document are the terms "Media Pro vider" and "Media Consumer". A Media Provider represents the Two key concepts introduced in this document are the terms "Media Pro vider" and "Media Consumer". A Media Provider represents the
entity that sends the media and a Media Consumer represents the entity that sends the media and a Media Consumer represents the
entity that receives the media. A Media Provider provides Media in entity that receives the media. A Media Provider provides Media in
skipping to change at line 337 skipping to change at line 339
telepresence devices, such as Endpoints and MCUs, would perform as telepresence devices, such as Endpoints and MCUs, would perform as
both Media Providers and Media Consumers, the former being both Media Providers and Media Consumers, the former being
concerned with those devices' transmitted media and the latter concerned with those devices' transmitted media and the latter
with those devices' received media. In a few circumstances, a with those devices' received media. In a few circumstances, a
CLUE-capable device includes only Consumer or Provider CLUE-capable device includes only Consumer or Provider
functionality, such as recorder-type Consumers or webcam-type functionality, such as recorder-type Consumers or webcam-type
Providers.</t> Providers.</t>
<t> <t>
The motivations for the framework outlined in this document The motivations for the framework outlined in this document
include the following:</t> include the following:</t>
<ol spacing="normal" type="1"> <ol spacing="normal" type="(%d)">
<li>Endpoints in telepresence systems typically have multiple Me dia <li>Endpoints in telepresence systems typically have multiple Me dia
Capture and Media Render devices, e.g., multiple cameras and Capture and Media Render devices, e.g., multiple cameras and
screens. While previous system designs were able to set up calls screens. While previous system designs were able to set up calls
that would capture media using all cameras and display media on al l that would capture media using all cameras and display media on al l
screens, for example, there was no mechanism that could associate screens, for example, there was no mechanism that could associate
these Media Captures with each other in space and time, in a cross -vendor interoperable way.</li> these Media Captures with each other in space and time, in a cross -vendor interoperable way.</li>
<!--[rfced] Please note the discrepancy in how devices are treat ed in <!--[rfced] Please note the discrepancy in how devices are treat ed in
these two snippets. Should these types of uses be made more these two snippets. Should these types of uses be made more
uniform? If so, please let us know how to update. uniform? If so, please let us know how to update.
skipping to change at line 371 skipping to change at line 373
--> -->
<li>The mere fact that there are multiple capturing and rendering <li>The mere fact that there are multiple capturing and rendering
devices, each of which may be configurable in aspects such as zoom , devices, each of which may be configurable in aspects such as zoom ,
leads to the difficulty that a variable number of such devices can leads to the difficulty that a variable number of such devices can
be used to capture different aspects of a region. The Capture be used to capture different aspects of a region. The Capture
Scene concept allows for the description of multiple setups for Scene concept allows for the description of multiple setups for
those multiple capture devices that could represent sensible those multiple capture devices that could represent sensible
operation points of the physical capture devices in a room, chosen operation points of the physical capture devices in a room, chosen
by the operator. A Consumer can pick and choose from those by the operator. A Consumer can pick and choose from those
configurations based on its rendering abilities and then inform th e configurations based on its rendering abilities and then inform th e
Provider about its choices. Details are provided in <xref target= "section-7" format="default"/>.</li> Provider about its choices. Details are provided in <xref target= "s-7" format="default"/>.</li>
<li>In some cases, physical limitations or other reasons disallo w <li>In some cases, physical limitations or other reasons disallo w
the concurrent use of a device in more than one setup. For the concurrent use of a device in more than one setup. For
example, the center camera in a typical three-camera conference example, the center camera in a typical three-camera conference
room can set its zoom objective to capture either the middle room can set its zoom objective to capture either the middle
few seats only or all seats of a room, but not both concurrently. The few seats only or all seats of a room, but not both concurrently. The
Simultaneous Transmission Set concept allows a Provider to signal Simultaneous Transmission Set concept allows a Provider to signal
such limitations. Simultaneous Transmission Sets are part of the such limitations. Simultaneous Transmission Sets are part of the
Capture Scene description and are discussed in <xref target="secti on-8" format="default"/>.</li> Capture Scene description and are discussed in <xref target="s-8" format="default"/>.</li>
<li>Often, the devices in a room do not have the computational <li>Often, the devices in a room do not have the computational
complexity or connectivity to deal with multiple encoding options complexity or connectivity to deal with multiple encoding options
simultaneously, even if each of these options is sensible in simultaneously, even if each of these options is sensible in
certain scenarios, and even if the simultaneous transmission is certain scenarios, and even if the simultaneous transmission is
also sensible (i.e., in case of multicast media distribution to also sensible (i.e., in case of multicast media distribution to
multiple endpoints). Such constraints can be expressed by the multiple endpoints). Such constraints can be expressed by the
Provider using the Encoding Group concept, which is described in < xref target="section-9" format="default"/>.</li> Provider using the Encoding Group concept, which is described in < xref target="s-9" format="default"/>.</li>
<li>Due to the potentially large number of RTP streams required for <li>Due to the potentially large number of RTP streams required for
a Multimedia Conference involving potentially many Endpoints, each a Multimedia Conference involving potentially many Endpoints, each
of which can have many Media Captures and media renderers, it has of which can have many Media Captures and media renderers, it has
become common to multiplex multiple RTP streams onto the same become common to multiplex multiple RTP streams onto the same
transport address, so as to avoid using the port number as a transport address, so as to avoid using the port number as a
multiplexing point and the associated shortcomings such as multiplexing point and the associated shortcomings such as
NAT/firewall traversal. The large number of possible permutations NAT/firewall traversal. The large number of possible permutations
of sensible options a Media Provider can make available to a Media of sensible options a Media Provider can make available to a Media
Consumer makes a mechanism desirable that allows it to narrow down Consumer makes a mechanism desirable that allows it to narrow down
the number of possible options that a SIP offer/answer exchange ha s the number of possible options that a SIP offer/answer exchange ha s
skipping to change at line 438 skipping to change at line 440
--> -->
The The
Media Provider and Media Consumer may use information in CLUE Media Provider and Media Consumer may use information in CLUE
messages to reduce the complexity of SIP offer/answer messages. messages to reduce the complexity of SIP offer/answer messages.
Also, there are aspects of the control of both Endpoints and MCUs Also, there are aspects of the control of both Endpoints and MCUs
that dynamically change during the progress of a call, such as that dynamically change during the progress of a call, such as
audio-level-based screen switching, layout changes, and so on, audio-level-based screen switching, layout changes, and so on,
which need to be conveyed. Note that these control aspects are which need to be conveyed. Note that these control aspects are
complementary to those specified in traditional SIP-based complementary to those specified in traditional SIP-based
conference management, such as Binary Floor Control Protocol (BFCP ). An exemplary call flow can be conference management, such as Binary Floor Control Protocol (BFCP ). An exemplary call flow can be
found in <xref target="section-5" format="default"/>.</li> found in <xref target="s-5" format="default"/>.</li>
</ol> </ol>
<t> <t>
Finally, all this information needs to be conveyed, and the notion Finally, all this information needs to be conveyed, and the notion
of support for it needs to be established. This is done by the of support for it needs to be established. This is done by the
negotiation of a "CLUE channel", a data channel negotiated early negotiation of a "CLUE channel", a data channel negotiated early
during the initiation of a call. An Endpoint or MCU that rejects during the initiation of a call. An Endpoint or MCU that rejects
the establishment of this data channel, by definition, does not the establishment of this data channel, by definition, does not
support CLUE-based mechanisms, whereas an Endpoint or MCU that support CLUE-based mechanisms, whereas an Endpoint or MCU that
accepts it is indicating support for CLUE as specified in this accepts it is indicating support for CLUE as specified in this
document and its companion documents.</t> document and its companion documents.</t>
</section> </section>
<section anchor="section-5" numbered="true" toc="default"> <section anchor="s-5" numbered="true" toc="default">
<name>Description of the Framework/Model</name> <name>Description of the Framework/Model</name>
<t> <t>
The CLUE framework specifies how multiple media streams are to be The CLUE framework specifies how multiple media streams are to be
handled in a telepresence conference.</t> handled in a telepresence conference.</t>
<t> <t>
A Media Provider (transmitting Endpoint or MCU) describes specific A Media Provider (transmitting Endpoint or MCU) describes specific
aspects of the content of the media and the media stream encodings aspects of the content of the media and the media stream encodings
it can send in an Advertisement; and the Media Consumer responds to it can send in an Advertisement; and the Media Consumer responds to
the Media Provider by specifying which content and media streams it the Media Provider by specifying which content and media streams it
wants to receive in a Configure message. The Provider then wants to receive in a Configure message. The Provider then
skipping to change at line 669 skipping to change at line 671
not use CLUE, then the CLUE-capable device falls back to behavior not use CLUE, then the CLUE-capable device falls back to behavior
that does not require CLUE.</t> that does not require CLUE.</t>
<t> <t>
As for the media, Provider and Consumer have an end-to-end As for the media, Provider and Consumer have an end-to-end
communication relationship with respect to (RTP-transported) media; communication relationship with respect to (RTP-transported) media;
and the mechanisms described herein and in companion documents do and the mechanisms described herein and in companion documents do
not change the aspects of setting up those RTP flows and sessions. not change the aspects of setting up those RTP flows and sessions.
In other words, the RTP media sessions conform to the negotiated In other words, the RTP media sessions conform to the negotiated
SDP whether or not CLUE is used.</t> SDP whether or not CLUE is used.</t>
</section> </section>
<section anchor="section-6" numbered="true" toc="default"> <section anchor="s-6" numbered="true" toc="default">
<name>Spatial Relationships</name> <name>Spatial Relationships</name>
<t> <t>
In order for a Consumer to perform a proper rendering, it is often In order for a Consumer to perform a proper rendering, it is often
necessary (or at least helpful) for the Consumer to have received necessary (or at least helpful) for the Consumer to have received
spatial information about the streams it is receiving. CLUE spatial information about the streams it is receiving. CLUE
defines a coordinate system that allows Media Providers to describe defines a coordinate system that allows Media Providers to describe
the spatial relationships of their Media Captures to enable proper the spatial relationships of their Media Captures to enable proper
scaling and spatially sensible rendering of their streams. The scaling and spatially sensible rendering of their streams. The
coordinate system is based on a few principles:</t> coordinate system is based on a few principles:</t>
<ul spacing="normal"> <ul spacing="normal">
skipping to change at line 728 skipping to change at line 730
Y increases from the front of the room to the back of the room; Y increases from the front of the room to the back of the room;
Z increases from low to high (i.e., floor to ceiling).</t> Z increases from low to high (i.e., floor to ceiling).</t>
<t> <t>
Cameras in a scene typically point in the direction of increasing Cameras in a scene typically point in the direction of increasing
Y, from front to back. But there could be multiple cameras Y, from front to back. But there could be multiple cameras
pointing in different directions. If the physical space does not pointing in different directions. If the physical space does not
have a well-defined front and back, the provider chooses any have a well-defined front and back, the provider chooses any
direction for X, Y, and Z consistent with right-handed direction for X, Y, and Z consistent with right-handed
coordinates.</t> coordinates.</t>
</section> </section>
<section anchor="section-7" numbered="true" toc="default"> <section anchor="s-7" numbered="true" toc="default">
<name>Media Captures and Capture Scenes</name> <name>Media Captures and Capture Scenes</name>
<t> <t>
This section describes how Providers can describe the content of This section describes how Providers can describe the content of
media to Consumers.</t> media to Consumers.</t>
<section anchor="section-7.1" numbered="true" toc="default"> <section anchor="s-7.1" numbered="true" toc="default">
<name>Media Captures</name> <name>Media Captures</name>
<t> <t>
Media Captures are the fundamental representations of streams that Media Captures are the fundamental representations of streams that
a device can transmit. What a Media Capture actually represents is a device can transmit. What a Media Capture actually represents is
flexible:</t> flexible:</t>
<ul spacing="normal"> <ul spacing="normal">
<li>It can represent the immediate output of a physical source (e.g., <li>It can represent the immediate output of a physical source (e.g.,
camera, microphone) or 'synthetic' source (e.g., laptop computer, DVD player).</li> camera, microphone) or 'synthetic' source (e.g., laptop computer, DVD player).</li>
<li>It can represent the output of an audio mixer or video com poser.</li> <li>It can represent the output of an audio mixer or video com poser.</li>
<li>It can represent a concept such as 'the loudest speaker'.< /li> <li>It can represent a concept such as 'the loudest speaker'.< /li>
skipping to change at line 782 skipping to change at line 784
Advertisement unique identity. The identity may be referenced Advertisement unique identity. The identity may be referenced
outside the Capture Scene that defines it through an MCC.</li> outside the Capture Scene that defines it through an MCC.</li>
<li>A Media Capture may be associated with one or more CSVs.</ li> <li>A Media Capture may be associated with one or more CSVs.</ li>
<li>A Media Capture has exactly one set of spatial information .</li> <li>A Media Capture has exactly one set of spatial information .</li>
<li>A Media Capture can be the source of at most one Capture <li>A Media Capture can be the source of at most one Capture
Encoding.</li> Encoding.</li>
</ul> </ul>
<t> <t>
Each Media Capture can be associated with attributes to describe Each Media Capture can be associated with attributes to describe
what it represents.</t> what it represents.</t>
<section anchor="section-7.1.1" numbered="true" toc="default"> <section anchor="s-7.1.1" numbered="true" toc="default">
<name>Media Capture Attributes</name> <name>Media Capture Attributes</name>
<t> <t>
Media Capture Attributes describe information about the Captures. Media Capture Attributes describe information about the Captures.
A Provider can use the Media Capture Attributes to describe the A Provider can use the Media Capture Attributes to describe the
Captures for the benefit of the Consumer of the Advertisement Captures for the benefit of the Consumer of the Advertisement
message. All these attributes are optional. Media Capture message. All these attributes are optional. Media Capture
Attributes include: Attributes include:
</t> </t>
<ul spacing="normal"> <ul spacing="normal">
<li>Spatial information, such as point of capture, point on line <li>Spatial information, such as point of capture, point on line
of capture, and area of capture, (all of which, in combination, of capture, and area of capture, (all of which, in combination,
define the capture field of, for example, a camera).</li> define the capture field of, for example, a camera).</li>
<li>Other descriptive information to help the Consumer choos e <li>Other descriptive information to help the Consumer choos e
between captures (e.g., description, presentation, view, between captures (e.g., description, presentation, view,
priority, language, person information, and type).</li> priority, language, person information, and type).</li>
</ul> </ul>
<t> <t>
The subsections below define the Capture attributes.</t> The subsections below define the Capture attributes.</t>
<section anchor="section-7.1.1.1" numbered="true" toc="default "> <section anchor="s-7.1.1.1" numbered="true" toc="default">
<name>Point of Capture</name> <name>Point of Capture</name>
<t> <t>
The Point of Capture attribute is a field with a single Cartesian The Point of Capture attribute is a field with a single Cartesian
(X, Y, Z) point value that describes the spatial location of the (X, Y, Z) point value that describes the spatial location of the
capturing device (such as camera). For an Audio Capture with capturing device (such as camera). For an Audio Capture with
multiple microphones, the Point of Capture defines the nominal midpoi nt of the microphones.</t> multiple microphones, the Point of Capture defines the nominal midpoi nt of the microphones.</t>
</section> </section>
<section anchor="section-7.1.1.2" numbered="true" toc="default "> <section anchor="s-7.1.1.2" numbered="true" toc="default">
<name>Point on Line of Capture</name> <name>Point on Line of Capture</name>
<t> <t>
The Point on Line of Capture attribute is a field with a single The Point on Line of Capture attribute is a field with a single
Cartesian (X, Y, Z) point value that describes a position in space Cartesian (X, Y, Z) point value that describes a position in space
of a second point on the axis of the capturing device, toward the of a second point on the axis of the capturing device, toward the
direction it is pointing; the first point being the Point of direction it is pointing; the first point being the Point of
Capture (see above).</t> Capture (see above).</t>
<t> <t>
Together, the Point of Capture and Point on Line of Capture define Together, the Point of Capture and Point on Line of Capture define
the direction and axis of the capturing device, for example, the the direction and axis of the capturing device, for example, the
skipping to change at line 837 skipping to change at line 839
picked up by the microphone providing this specific audio capture. picked up by the microphone providing this specific audio capture.
If the Consumer wants to associate an Audio Capture with a Video If the Consumer wants to associate an Audio Capture with a Video
Capture, it can compare this volume with the area of capture for Capture, it can compare this volume with the area of capture for
video media to provide a check on whether the audio capture is video media to provide a check on whether the audio capture is
indeed spatially associated with the video capture. For example, a indeed spatially associated with the video capture. For example, a
video area of capture that fails to intersect at all with the audio video area of capture that fails to intersect at all with the audio
volume of capture, or is at such a long radial distance from the volume of capture, or is at such a long radial distance from the
microphone point of capture that the audio level would be very low, microphone point of capture that the audio level would be very low,
would be inappropriate.</t> would be inappropriate.</t>
</section> </section>
<section anchor="section-7.1.1.3" numbered="true" toc="default "> <section anchor="s-7.1.1.3" numbered="true" toc="default">
<name>Area of Capture</name> <name>Area of Capture</name>
<t> <t>
The Area of Capture is a field with a set of four (X, Y, Z) points The Area of Capture is a field with a set of four (X, Y, Z) points
as a value that describes the spatial location of what is being as a value that describes the spatial location of what is being
"captured". This attribute applies only to video captures, not "captured". This attribute applies only to video captures, not
other types of media. By comparing the Area of Capture for other types of media. By comparing the Area of Capture for
different Video Captures within the same Capture Scene, a Consumer different Video Captures within the same Capture Scene, a Consumer
can determine the spatial relationships between them and render can determine the spatial relationships between them and render
them correctly.</t> them correctly.</t>
<t> <t>
The four points MUST be co-planar, forming a quadrilateral, which The four points MUST be co-planar, forming a quadrilateral, which
defines the Plane of Interest for the particular Media Capture.</t> defines the Plane of Interest for the particular Media Capture.</t>
<t> <t>
If the Area of Capture is not specified, it means the Video Capture If the Area of Capture is not specified, it means the Video Capture
might be spatially related to other Captures in the same Scene, but might be spatially related to other Captures in the same Scene, but
there is no detailed information on the relationship. For a switched there is no detailed information on the relationship. For a switched
Capture that switches between different sections within a larger Capture that switches between different sections within a larger
area, the area of capture MUST use coordinates for the larger area, the area of capture MUST use coordinates for the larger
potential area.</t> potential area.</t>
</section> </section>
<section anchor="section-7.1.1.4" numbered="true" toc="default "> <section anchor="s-7.1.1.4" numbered="true" toc="default">
<name>Mobility of Capture</name> <name>Mobility of Capture</name>
<t> <t>
The Mobility of Capture attribute indicates whether or not the The Mobility of Capture attribute indicates whether or not the
point of capture, line on point of capture, and area of capture point of capture, line on point of capture, and area of capture
values stay the same over time, or are expected to change values stay the same over time, or are expected to change
(potentially frequently). Possible values are static, dynamic, and (potentially frequently). Possible values are static, dynamic, and
highly dynamic.</t> highly dynamic.</t>
<t> <t>
An example for "dynamic" is a camera mounted on a stand that is An example for "dynamic" is a camera mounted on a stand that is
occasionally hand-carried and placed at different positions in occasionally hand-carried and placed at different positions in
skipping to change at line 884 skipping to change at line 886
The capture point of a static Capture MUST NOT move for the life of The capture point of a static Capture MUST NOT move for the life of
the CLUE session. The capture point of dynamic Captures is the CLUE session. The capture point of dynamic Captures is
categorized by a change in position followed by a reasonable period categorized by a change in position followed by a reasonable period
of stability -- in the order of magnitude of minutes. Highly of stability -- in the order of magnitude of minutes. Highly
dynamic captures are categorized by a capture point that is dynamic captures are categorized by a capture point that is
constantly moving. If the "area of capture", "capture point" and constantly moving. If the "area of capture", "capture point" and
"line of capture" attributes are included with dynamic or highly "line of capture" attributes are included with dynamic or highly
dynamic Captures they indicate spatial information at the time of dynamic Captures they indicate spatial information at the time of
the Advertisement.</t> the Advertisement.</t>
</section> </section>
<section anchor="section-7.1.1.5" numbered="true" toc="default "> <section anchor="s-7.1.1.5" numbered="true" toc="default">
<name>Audio Capture Sensitivity Pattern</name> <name>Audio Capture Sensitivity Pattern</name>
<t> <t>
The Audio Capture Sensitivity Pattern attribute applies only to The Audio Capture Sensitivity Pattern attribute applies only to
audio captures. This attribute gives information about the nominal audio captures. This attribute gives information about the nominal
sensitivity pattern of the microphone that is the source of the sensitivity pattern of the microphone that is the source of the
Capture. Possible values include patterns such as omni, shotgun, Capture. Possible values include patterns such as omni, shotgun,
cardioid, and hyper-cardioid.</t> cardioid, and hyper-cardioid.</t>
</section> </section>
<section anchor="section-7.1.1.6" numbered="true" toc="default "> <section anchor="s-7.1.1.6" numbered="true" toc="default">
<name>Description</name> <name>Description</name>
<t> <t>
The Description attribute is a human-readable description (which The Description attribute is a human-readable description (which
could be in multiple languages) of the Capture.</t> could be in multiple languages) of the Capture.</t>
</section> </section>
<section anchor="section-7.1.1.7" numbered="true" toc="default "> <section anchor="s-7.1.1.7" numbered="true" toc="default">
<name>Presentation</name> <name>Presentation</name>
<t> <t>
The Presentation attribute indicates that the capture originates The Presentation attribute indicates that the capture originates
from a presentation device, that is, one that provides supplementary from a presentation device, that is, one that provides supplementary
information to a conference through slides, video, still images, information to a conference through slides, video, still images,
data, etc. Where more information is known about the capture, it MAY data, etc. Where more information is known about the capture, it MAY
be expanded hierarchically to indicate the different types of be expanded hierarchically to indicate the different types of
presentation media, e.g., presentation.slides, presentation.image, presentation media, e.g., presentation.slides, presentation.image,
etc.</t> etc.</t>
<t> <t>
Note: It is expected that a number of keywords will be defined that Note: It is expected that a number of keywords will be defined that
provide more detail on the type of presentation. Refer to <xref targe t="RFCYYY1" format="default"/> for how to extend the model.</t> provide more detail on the type of presentation. Refer to <xref targe t="RFCYYY1" format="default"/> for how to extend the model.</t>
</section> </section>
<section anchor="section-7.1.1.8" numbered="true" toc="default "> <section anchor="s-7.1.1.8" numbered="true" toc="default">
<name>View</name> <name>View</name>
<t> <t>
The View attribute is a field with enumerated values, indicating The View attribute is a field with enumerated values, indicating
what type of view the Capture relates to. The Consumer can use what type of view the Capture relates to. The Consumer can use
this information to help choose which Media Captures it wishes to this information to help choose which Media Captures it wishes to
receive. Possible values are as follows:</t> receive. Possible values are as follows:</t>
<dl newline="false" spacing="normal" indent="12"> <dl newline="false" spacing="normal" indent="12">
<dt>Room:</dt> <dt>Room:</dt>
<dd>Captures the entire scene <dd>Captures the entire scene
</dd> </dd>
skipping to change at line 938 skipping to change at line 940
<dd>Captures an individual person</dd> <dd>Captures an individual person</dd>
<dt>Lectern:</dt> <dt>Lectern:</dt>
<dd>Captures the region of the lectern including the <dd>Captures the region of the lectern including the
presenter, for example, in a classroom-style conference room presenter, for example, in a classroom-style conference room
</dd> </dd>
<dt>Audience:</dt> <dt>Audience:</dt>
<dd>Captures a region showing the audience in a classroom- style conference room <dd>Captures a region showing the audience in a classroom- style conference room
</dd> </dd>
</dl> </dl>
</section> </section>
<section anchor="section-7.1.1.9" numbered="true" toc="default "> <section anchor="s-7.1.1.9" numbered="true" toc="default">
<name>Language</name> <name>Language</name>
<t> <t>
The Language attribute indicates one or more languages used in the The Language attribute indicates one or more languages used in the
content of the Media Capture. Captures MAY be offered in different content of the Media Capture. Captures MAY be offered in different
languages in case of multilingual and/or accessible conferences. A languages in case of multilingual and/or accessible conferences. A
Consumer can use this attribute to differentiate between them and Consumer can use this attribute to differentiate between them and
pick the appropriate one.</t> pick the appropriate one.</t>
<t> <t>
Note that the Language attribute is defined and meaningful both for Note that the Language attribute is defined and meaningful both for
audio and video captures. In case of audio captures, the meaning audio and video captures. In case of audio captures, the meaning
is obvious. For a video capture, "Language" could, for example, be is obvious. For a video capture, "Language" could, for example, be
sign interpretation or text.</t> sign interpretation or text.</t>
<t> <t>
The Language attribute is coded per <xref target="RFC5646" format="de fault"/>.</t> The Language attribute is coded per <xref target="RFC5646" format="de fault"/>.</t>
</section> </section>
<section anchor="section-7.1.1.10" numbered="true" toc="defaul t"> <section anchor="s-7.1.1.10" numbered="true" toc="default">
<name>Person Information</name> <name>Person Information</name>
<t> <t>
The Person Information attribute allows a Provider to provide The Person Information attribute allows a Provider to provide
specific information regarding the people in a Capture (regardless specific information regarding the people in a Capture (regardless
of whether or not the capture has a Presentation attribute). The of whether or not the capture has a Presentation attribute). The
Provider may gather the information automatically or manually from Provider may gather the information automatically or manually from
a variety of sources; however, the xCard <xref target="RFC6351" forma t="default"/> format is used to a variety of sources; however, the xCard <xref target="RFC6351" forma t="default"/> format is used to
convey the information. This allows various information, such as convey the information. This allows various information, such as
Identification information (Section 6.2 of <xref target="RFC6350" for Identification information (<xref section="6.2" sectionFormat="of" ta
mat="default"/>), Communication rget="RFC6350" format="default"/>), Communication
Information (Section 6.4 of <xref target="RFC6350" format="default"/> Information (<xref section="6.4" sectionFormat="of" target="RFC6350"
), and Organizational information format="default"/>), and Organizational information
(Section 6.6 of <xref target="RFC6350" format="default"/>), to be com (<xref section="6.6" sectionFormat="of" target="RFC6350" format="defa
municated. A Consumer may then ult"/>), to be communicated. A Consumer may then
automatically (i.e., via a policy) or manually select Captures automatically (i.e., via a policy) or manually select Captures
based on information about who is in a Capture. It also allows a based on information about who is in a Capture. It also allows a
Consumer to render information regarding the people participating Consumer to render information regarding the people participating
in the conference or to use it for further processing.</t> in the conference or to use it for further processing.</t>
<t> <t>
The Provider may supply a minimal set of information or a larger The Provider may supply a minimal set of information or a larger
set of information. However, it MUST be compliant to <xref target="RF C6350" format="default"/> and set of information. However, it MUST be compliant to <xref target="RF C6350" format="default"/> and
supply a "VERSION" and "FN" property. A Provider may supply supply a "VERSION" and "FN" property. A Provider may supply
multiple xCards per Capture of any KIND (Section 6.1.4 of <xref targe t="RFC6350" format="default"/>).</t> multiple xCards per Capture of any KIND (<xref section="6.1.4" sectio nFormat="of" target="RFC6350" format="default"/>).</t>
<t> <t>
In order to keep CLUE messages compact, the Provider SHOULD use a In order to keep CLUE messages compact, the Provider SHOULD use a
URI to point to any LOGO, PHOTO, or SOUND contained in the xCARD URI to point to any LOGO, PHOTO, or SOUND contained in the xCARD
rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE
message.</t> message.</t>
</section> </section>
<section anchor="section-7.1.1.11" numbered="true" toc="defaul t"> <section anchor="s-7.1.1.11" numbered="true" toc="default">
<name>Person Type</name> <name>Person Type</name>
<t> <t>
The Person Type attribute indicates the type of people contained in The Person Type attribute indicates the type of people contained in
the capture with respect to the meeting agenda (regardless of the capture with respect to the meeting agenda (regardless of
whether or not the capture has a Presentation attribute). As a whether or not the capture has a Presentation attribute). As a
capture may include multiple people the attribute may contain capture may include multiple people the attribute may contain
multiple values. However, values MUST NOT be repeated within the multiple values. However, values MUST NOT be repeated within the
attribute.</t> attribute.</t>
<t> <t>
An Advertiser associates the person type with an individual capture An Advertiser associates the person type with an individual capture
skipping to change at line 1033 skipping to change at line 1035
or commentary in the meeting.</dd> or commentary in the meeting.</dd>
<dt>Timekeeper:</dt> <dt>Timekeeper:</dt>
<dd>the person responsible for maintaining the <dd>the person responsible for maintaining the
meeting schedule.</dd> meeting schedule.</dd>
</dl> </dl>
<t> <t>
Furthermore, the person type attribute may contain one or more Furthermore, the person type attribute may contain one or more
strings allowing the Provider to indicate custom meeting-specific strings allowing the Provider to indicate custom meeting-specific
types.</t> types.</t>
</section> </section>
<section anchor="section-7.1.1.12" numbered="true" toc="defaul t"> <section anchor="s-7.1.1.12" numbered="true" toc="default">
<name>Priority</name> <name>Priority</name>
<t> <t>
The Priority attribute indicates a relative priority between The Priority attribute indicates a relative priority between
different Media Captures. The Provider sets this priority, and the different Media Captures. The Provider sets this priority, and the
Consumer MAY use the priority to help decide which Captures it Consumer MAY use the priority to help decide which Captures it
wishes to receive.</t> wishes to receive.</t>
<t> <t>
The "priority" attribute is an integer that indicates a relative The "priority" attribute is an integer that indicates a relative
priority between Captures. For example, it is possible to assign a priority between Captures. For example, it is possible to assign a
priority between two presentation Captures that would allow a priority between two presentation Captures that would allow a
remote Endpoint to determine which presentation is more important. remote Endpoint to determine which presentation is more important.
Priority is assigned at the individual Capture level. It represents Priority is assigned at the individual Capture level. It represents
the Provider's view of the relative priority between Captures with the Provider's view of the relative priority between Captures with
a priority. The same priority number MAY be used across multiple a priority. The same priority number MAY be used across multiple
Captures. It indicates that they are equally important. If no priorit y Captures. It indicates that they are equally important. If no priorit y
is assigned, no assumptions regarding relative importance of the is assigned, no assumptions regarding relative importance of the
Capture can be assumed.</t> Capture can be assumed.</t>
</section> </section>
<section anchor="section-7.1.1.13" numbered="true" toc="defaul t"> <section anchor="s-7.1.1.13" numbered="true" toc="default">
<name>Embedded Text</name> <name>Embedded Text</name>
<t> <t>
The Embedded Text attribute indicates that a Capture provides The Embedded Text attribute indicates that a Capture provides
embedded textual information. For example, the video Capture may embedded textual information. For example, the video Capture may
contain speech-to-text information composed with the video image.</t> contain speech-to-text information composed with the video image.</t>
</section> </section>
<section anchor="section-7.1.1.14" numbered="true" toc="defaul t"> <section anchor="s-7.1.1.14" numbered="true" toc="default">
<name>Related To</name> <name>Related To</name>
<t> <t>
The Related To attribute indicates the Capture contains additional The Related To attribute indicates the Capture contains additional
complementary information related to another Capture. The value complementary information related to another Capture. The value
indicates the identity of the other Capture to which this Capture indicates the identity of the other Capture to which this Capture
is providing additional information.</t> is providing additional information.</t>
<t> <t>
For example, a conference can utilize translators or facilitators For example, a conference can utilize translators or facilitators
that provide an additional audio stream (i.e., a translation or that provide an additional audio stream (i.e., a translation or
description or commentary of the conference). Where multiple description or commentary of the conference). Where multiple
captures are available, it may be advantageous for a Consumer to captures are available, it may be advantageous for a Consumer to
select a complementary Capture instead of or in addition to a select a complementary Capture instead of or in addition to a
Capture it relates to.</t> Capture it relates to.</t>
</section> </section>
</section> </section>
</section> </section>
<section anchor="section-7.2" numbered="true" toc="default"> <section anchor="s-7.2" numbered="true" toc="default">
<name>Multiple Content Capture</name> <name>Multiple Content Capture</name>
<t> <t>
The MCC indicates that one or more Single Media Captures are The MCC indicates that one or more Single Media Captures are
multiplexed (temporally and/or spatially) or mixed in one Media multiplexed (temporally and/or spatially) or mixed in one Media
Capture. Only one Capture type (i.e., audio, video, etc.) is Capture. Only one Capture type (i.e., audio, video, etc.) is
allowed in each MCC instance. The MCC may contain a reference to allowed in each MCC instance. The MCC may contain a reference to
the Single Media Captures (which may have their own attributes) as the Single Media Captures (which may have their own attributes) as
well as attributes associated with the MCC itself. An MCC may also well as attributes associated with the MCC itself. An MCC may also
contain other MCCs. The MCC MAY reference Captures from within the contain other MCCs. The MCC MAY reference Captures from within the
Capture Scene that defines it or from other Capture Scenes. No Capture Scene that defines it or from other Capture Scenes. No
skipping to change at line 1099 skipping to change at line 1101
the MCC contains content from multiple sources, but no information the MCC contains content from multiple sources, but no information
regarding those sources is given. MCCs either contain the regarding those sources is given. MCCs either contain the
referenced Captures and no others or have no referenced captures referenced Captures and no others or have no referenced captures
and, therefore, may contain any Capture.</t> and, therefore, may contain any Capture.</t>
<t> <t>
One or more MCCs may also be specified in a CSV. This allows an One or more MCCs may also be specified in a CSV. This allows an
Advertiser to indicate that several MCC captures are used to Advertiser to indicate that several MCC captures are used to
represent a capture scene. <xref target="ref-advertisement-sent-to-e ndpoint-f-two-encodings" format="default"/> provides an example of this represent a capture scene. <xref target="ref-advertisement-sent-to-e ndpoint-f-two-encodings" format="default"/> provides an example of this
case.</t> case.</t>
<t> <t>
As outlined in <xref target="section-7.1" format="default"/>, each in stance of the MCC has its own As outlined in <xref target="s-7.1" format="default"/>, each instance of the MCC has its own
Capture identity, i.e., MCC1. It allows all the individual captures Capture identity, i.e., MCC1. It allows all the individual captures
contained in the MCC to be referenced by a single MCC identity.</t> contained in the MCC to be referenced by a single MCC identity.</t>
<t>The example below shows the use of a Multiple Content Capture :</t> <t>The example below shows the use of a Multiple Content Capture :</t>
<table anchor="ref-multiple-content-capture-concept" align="cent er"> <table anchor="ref-multiple-content-capture-concept" align="cent er">
<name>Multiple Content Capture Concept</name> <name>Multiple Content Capture Concept</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left"> Capture Scene #1</th>
<th align="left"> </th> <th align="left"> </th>
</tr> </tr>
skipping to change at line 1137 skipping to change at line 1139
</tr> </tr>
<tr> <tr>
<td align="left">CSV(MCC1)</td> <td align="left">CSV(MCC1)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t> <t>
This indicates that MCC1 is a single capture that contains the This indicates that MCC1 is a single capture that contains the
Captures VC1, VC2, and VC3, according to any MCC1 attributes.</t> Captures VC1, VC2, and VC3, according to any MCC1 attributes.</t>
<section anchor="section-7.2.1" numbered="true" toc="default"> <section anchor="s-7.2.1" numbered="true" toc="default">
<name>MCC Attributes</name> <name>MCC Attributes</name>
<t> <t>
Media Capture Attributes may be associated with the MCC instance Media Capture Attributes may be associated with the MCC instance
and the Single Media Captures that the MCC references. A Provider and the Single Media Captures that the MCC references. A Provider
should avoid providing conflicting attribute values between the MCC should avoid providing conflicting attribute values between the MCC
and Single Media Captures. Where there is conflict the attributes and Single Media Captures. Where there is conflict the attributes
of the MCC, a Provider should override any that may be present in the individual of the MCC, a Provider should override any that may be present in the individual
Captures.</t> Captures.</t>
<t> <t>
A Provider MAY include as much or as little of the original source A Provider MAY include as much or as little of the original source
Capture information as it requires.</t> Capture information as it requires.</t>
<t> <t>
There are MCC-specific attributes that MUST only be used with There are MCC-specific attributes that MUST only be used with
Multiple Content Captures. These are described in the sections Multiple Content Captures. These are described in the sections
below. The attributes described in <xref target="section-7.1.1" forma t="default"/> MAY also be used below. The attributes described in <xref target="s-7.1.1" format="def ault"/> MAY also be used
with MCCs.</t> with MCCs.</t>
<t> <t>
The spatial-related attributes of an MCC indicate its area of The spatial-related attributes of an MCC indicate its area of
capture and point of capture within the scene, just like any other capture and point of capture within the scene, just like any other
media capture. The spatial information does not imply anything media capture. The spatial information does not imply anything
about how other captures are composed within an MCC.</t> about how other captures are composed within an MCC.</t>
<t>For example: a virtual scene could be constructed for the MCC <t>For example: a virtual scene could be constructed for the MCC
capture with two Video Captures with a "MaxCaptures" attribute se t capture with two Video Captures with a "MaxCaptures" attribute se t
to 2 and an "Area of Capture" attribute provided with an overall to 2 and an "Area of Capture" attribute provided with an overall
area. Each of the individual Captures could then also include an area. Each of the individual Captures could then also include an
"Area of Capture" attribute with a subset of the overall area. "Area of Capture" attribute with a subset of the overall area.
The Consumer would then know how each capture is related to others The Consumer would then know how each capture is related to others
within the scene, but not the relative position of the individual within the scene, but not the relative position of the individual
captures within the composed capture. captures within the composed capture.
<!--[rfced] Please note that some "Tables" have been updated to
"Figures" as they contain text or alignment that cannot yet be
handled using a <texttable> in XML. Please let us know any
objections.
</t> </t>
<figure anchor="table_2">
<name>Example of MCC and Single Media Capture Attributes</na <table anchor="table_2">
me> <name>Example of MCC and Single Media Capture Attributes</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <thead>
+-----------------------+---------------------------------+ <tr><th align="left">Capture Scene #1</th><th/></tr>
| Capture Scene #1 | | </thead>
+-----------------------|---------------------------------+ <tbody>
| VC1 | AreaofCapture=(0,0,0)(9,0,0) | <tr>
| | (0,0,9)(9,0,9) | <td>VC1</td>
| VC2 | AreaofCapture=(10,0,0)(19,0,0) | <td align="left">
| | (10,0,9)(19,0,9) | <artwork align="left">
| MCC1(VC1,VC2) | MaxCaptures=2 | AreaofCapture=(0,0,0)(9,0,0)
| | AreaofCapture=(0,0,0)(19,0,0) | (0,0,9)(9,0,9)
| | (0,0,9)(19,0,9) | </artwork>
| CSV(MCC1) | | </td>
+---------------------------------------------------------+ </tr>
]]></artwork> <tr>
</figure> <td>VC2</td>
<td align="left">
<artwork align="left">
AreaofCapture=(10,0,0)(19,0,0)
(10,0,9)(19,0,9)
</artwork>
</td>
</tr>
<tr>
<td>MCC1(VC1,VC2)</td>
<td align="left"><artwork align="left">
MaxCaptures=2
AreaofCapture=(0,0,0)(19,0,0)
(0,0,9)(19,0,9)
</artwork>
</td>
</tr>
<tr>
<td>CSV(MCC1)</td>
<td/>
</tr>
</tbody>
</table>
<t> <t>
The subsections below describe the MCC-only attributes.</t> The subsections below describe the MCC-only attributes.</t>
<section anchor="section-7.2.1.1" numbered="true" toc="default "> <section anchor="s-7.2.1.1" numbered="true" toc="default">
<name>Maximum Number of Captures within an MCC</name> <name>Maximum Number of Captures within an MCC</name>
<t> <t>
The Maximum Number of Captures MCC attribute indicates the maximum The Maximum Number of Captures MCC attribute indicates the maximum
number of individual Captures that may appear in a Capture Encoding number of individual Captures that may appear in a Capture Encoding
at a time. The actual number at any given time can be less than or at a time. The actual number at any given time can be less than or
equal to this maximum. It may be used to derive how the Single equal to this maximum. It may be used to derive how the Single
Media Captures within the MCC are composed/switched with regard Media Captures within the MCC are composed/switched with regard
to space and time.</t> to space and time.</t>
<!--[rfced] The relationship between "MaxCaptures" and "Maxi mum Number <!--[rfced] The relationship between "MaxCaptures" and "Maxi mum Number
of Captures MCC attribute" is not made clear. Will the reader of Captures MCC attribute" is not made clear. Will the reader
skipping to change at line 1272 skipping to change at line 1292
<t> <t>
If this attribute is not set, then as a default, it is assumed that a ll If this attribute is not set, then as a default, it is assumed that a ll
source media capture content can appear concurrently in the Capture source media capture content can appear concurrently in the Capture
Encoding associated with the MCC.</t> Encoding associated with the MCC.</t>
<t> <t>
For example, the use of MaxCaptures equal to 1 on an MCC with three For example, the use of MaxCaptures equal to 1 on an MCC with three
Video Captures, VC1, VC2, and VC3, would indicate that the Advertiser Video Captures, VC1, VC2, and VC3, would indicate that the Advertiser
in the Capture Encoding would switch between VC1, VC2, and VC3 as in the Capture Encoding would switch between VC1, VC2, and VC3 as
there may be only a maximum of one Capture at a time.</t> there may be only a maximum of one Capture at a time.</t>
</section> </section>
<section anchor="section-7.2.1.2" numbered="true" toc="default "> <section anchor="s-7.2.1.2" numbered="true" toc="default">
<name>Policy</name> <name>Policy</name>
<t> <t>
The Policy MCC Attribute indicates the criteria that the Provider The Policy MCC Attribute indicates the criteria that the Provider
uses to determine when and/or where media content appears in the uses to determine when and/or where media content appears in the
Capture Encoding related to the MCC.</t> Capture Encoding related to the MCC.</t>
<t> <t>
The attribute is in the form of a token that indicates the policy The attribute is in the form of a token that indicates the policy
and an index representing an instance of the policy. The same and an index representing an instance of the policy. The same
index value can be used for multiple MCCs.</t> index value can be used for multiple MCCs.</t>
<t> <t>
skipping to change at line 1325 skipping to change at line 1345
<tr> <tr>
<td align="left">VC1</td> <td align="left">VC1</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
<tr> <tr>
<td align="left">VC2</td> <td align="left">VC2</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
<tr> <tr>
<td align="left">MCC1(VC1,VC2)</td> <td align="left">MCC1(VC1,VC2)</td>
<td align="left">Policy=SoundLevel:0</td> <td align="left">Policy=SoundLevel:0<br/>
</tr> MaxCaptures=1</td>
<tr>
<td align="left"/>
<td align="left">MaxCaptures=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">MCC2(VC1,VC2)</td> <td align="left">MCC2(VC1,VC2)</td>
<td align="left">Policy=SoundLevel:1</td> <td align="left">Policy=SoundLevel:1<br/>
</tr> MaxCaptures=1</td>
<tr>
<td align="left"/>
<td align="left">MaxCaptures=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">CSV(MCC1,MCC2)</td> <td align="left">CSV(MCC1,MCC2)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
</section> </section>
<section anchor="section-7.2.1.3" numbered="true" toc="default
"> <section anchor="s-7.2.1.3" numbered="true" toc="default">
<name>Synchronization Identity</name> <name>Synchronization Identity</name>
<t> <t>
The Synchronization Identity MCC attribute indicates how the The Synchronization Identity MCC attribute indicates how the
individual Captures in multiple MCC Captures are synchronized. To individual Captures in multiple MCC Captures are synchronized. To
indicate that the Capture Encodings associated with MCCs contain indicate that the Capture Encodings associated with MCCs contain
Captures from the same source at the same time, a Provider should Captures from the same source at the same time, a Provider should
set the same Synchronization Identity on each of the concerned set the same Synchronization Identity on each of the concerned
MCCs. It is the Provider that determines what the source for the MCCs. It is the Provider that determines what the source for the
Captures is, so a Provider can choose how to group together Single Captures is, so a Provider can choose how to group together Single
Media Captures into a combined "source" for the purpose of Media Captures into a combined "source" for the purpose of
skipping to change at line 1368 skipping to change at line 1383
SynchronizationID attribute. For example, when the Provider is in SynchronizationID attribute. For example, when the Provider is in
an MCU, it may determine that each separate CLUE Endpoint is a an MCU, it may determine that each separate CLUE Endpoint is a
remote source of media. The Synchronization Identity may be used remote source of media. The Synchronization Identity may be used
across media types, i.e., to synchronize audio- and video-related across media types, i.e., to synchronize audio- and video-related
MCCs.</t> MCCs.</t>
<t> <t>
Without this attribute it is assumed that multiple MCCs may provide Without this attribute it is assumed that multiple MCCs may provide
content from different sources at any particular point in time.</t> content from different sources at any particular point in time.</t>
<t>For example: <t>For example:
</t> </t>
<figure anchor="table_4">
<table anchor="table_4">
<name>Example Synchronization Identity MCC Attribute Usage </name> <name>Example Synchronization Identity MCC Attribute Usage </name>
<artwork name="" type="" align="left" alt=""><![CDATA[
+=======================+=================================+ <tbody>
| Capture Scene #1 | | <tr><th>Capture Scene #1</th> <th/></tr>
+-----------------------|---------------------------------+ <tr><td>VC1</td> <td>Description=Left</
| VC1 | Description=Left | td></tr>
| VC2 | Description=Center | <tr><td>VC2</td> <td>Description=Center
| VC3 | Description=Right | </td></tr>
| AC1 | Description=Room | <tr><td>VC3</td> <td>Description=Right<
| CSV(VC1,VC2,VC3) | | /td></tr>
| CSV(AC1) | | <tr><td>AC1</td> <td>Description=Room</
+=======================+=================================+ td></tr>
| Capture Scene #2 | | <tr><td>CSV(VC1,VC2,VC3)</td> <td/></tr>
+-----------------------|---------------------------------+ <tr><td>CSV(AC1)</td> <td/></tr>
| VC4 | Description=Left | </tbody>
| VC5 | Description=Center |
| VC6 | Description=Right | <tbody>
| AC2 | Description=Room | <tr><th>Capture Scene #2</th> <th/></tr>
| CSV(VC4,VC5,VC6) | |
| CSV(AC2) | | <tr><td>VC4</td> <td>Description=Left</
+=======================+=================================+ td></tr>
| Capture Scene #3 | | <tr><td>VC5</td> <td>Description=Center
+-----------------------|---------------------------------+ </td></tr>
| VC7 | | <tr><td>VC6</td> <td>Description=Right<
| AC3 | | /td></tr>
+=======================+=================================+ <tr><td>AC2</td> <td>Description=Room</
| Capture Scene #4 | | td></tr>
+-----------------------|---------------------------------+ <tr><td>CSV(VC4,VC5,VC6)</td> <td/></tr>
| VC8 | | <tr><td>CSV(AC2)</td> <td/></tr>
| AC4 | | </tbody>
+=======================+=================================+
| Capture Scene #5 | | <tbody>
+-----------------------|---------------------------------+ <tr><th>Capture Scene #3</th> <th/></tr>
| MCC1(VC1,VC4,VC7) | SynchronizationID=1 |
| | MaxCaptures=1 | <tr><td>VC7</td> <td/></tr>
| MCC2(VC2,VC5,VC8) | SynchronizationID=1 | <tr><td>AC3</td> <td/></tr>
| | MaxCaptures=1 |
| MCC3(VC3,VC6) | MaxCaptures=1 | </tbody>
| MCC4(AC1,AC2,AC3,AC4) | SynchronizationID=1 |
| | MaxCaptures=1 | <tbody>
| CSV(MCC1,MCC2,MCC3) | | <tr><th>Capture Scene #4</th> <th/></tr>
| CSV(MCC4) | |
+=======================+=================================+ <tr><td>VC8</td> <td/></tr>
]]></artwork> <tr><td>AC4</td> <td/></tr>
</figure> </tbody>
<tbody>
<tr><th>Capture Scene #5</th> <th/></tr>
<tr><td>MCC1(VC1,VC4,VC7)</td> <td>SynchronizationID
=1<br/>MaxCaptures=1</td></tr>
<tr><td>MCC2(VC2,VC5,VC8)</td> <td>SynchronizationID
=1<br/>MaxCaptures=1</td></tr>
<tr><td>MCC3(VC3,VC6)</td> <td>MaxCaptures=1</td
></tr>
<tr><td>MCC4(AC1,AC2,AC3,AC4)</td> <td>SynchronizationID
=1<br/>MaxCaptures=1</td></tr>
<tr><td>CSV(MCC1,MCC2,MCC3)</td> <td/></tr>
<tr><td>CSV(MCC4)</td> <td/></tr>
</tbody>
</table>
<t> <t>
The above Advertisement would indicate that MCC1, MCC2, MCC3, and The above Advertisement would indicate that MCC1, MCC2, MCC3, and
MCC4 make up a Capture Scene. There would be four Capture MCC4 make up a Capture Scene. There would be four Capture
Encodings (one for each MCC). Because MCC1 and MCC2 have the same Encodings (one for each MCC). Because MCC1 and MCC2 have the same
SynchronizationID, each Encoding from MCC1 and MCC2, respectively, SynchronizationID, each Encoding from MCC1 and MCC2, respectively,
would together have content from only Capture Scene 1 or only would together have content from only Capture Scene 1 or only
Capture Scene 2 or the combination of VC7 and VC8 at a particular Capture Scene 2 or the combination of VC7 and VC8 at a particular
point in time. In this case, the Provider has decided the sources point in time. In this case, the Provider has decided the sources
to be synchronized are Scene #1, Scene #2, and Scene #3 and #4 to be synchronized are Scene #1, Scene #2, and Scene #3 and #4
together. The Encoding from MCC3 would not be synchronized with together. The Encoding from MCC3 would not be synchronized with
MCC1 or MCC2. As MCC4 also has the same Synchronization Identity MCC1 or MCC2. As MCC4 also has the same Synchronization Identity
as MCC1 and MCC2, the content of the audio Encoding will be as MCC1 and MCC2, the content of the audio Encoding will be
synchronized with the video content.</t> synchronized with the video content.</t>
</section> </section>
<section anchor="section-7.2.1.4" numbered="true" toc="default "> <section anchor="s-7.2.1.4" numbered="true" toc="default">
<name>Allow Subset Choice</name> <name>Allow Subset Choice</name>
<t> <t>
The Allow Subset Choice MCC attribute is a boolean value, The Allow Subset Choice MCC attribute is a boolean value,
indicating whether or not the Provider allows the Consumer to indicating whether or not the Provider allows the Consumer to
choose a specific subset of the Captures referenced by the MCC. choose a specific subset of the Captures referenced by the MCC.
If this attribute is true, and the MCC references other Captures, If this attribute is true, and the MCC references other Captures,
then the Consumer MAY select (in a Configure message) a specific then the Consumer MAY select (in a Configure message) a specific
subset of those Captures to be included in the MCC, and the subset of those Captures to be included in the MCC, and the
Provider MUST then include only that subset. If this attribute is Provider MUST then include only that subset. If this attribute is
false, or the MCC does not reference other Captures, then the false, or the MCC does not reference other Captures, then the
Consumer MUST NOT select a subset.</t> Consumer MUST NOT select a subset.</t>
</section> </section>
</section> </section>
</section> </section>
<section anchor="section-7.3" numbered="true" toc="default"> <section anchor="s-7.3" numbered="true" toc="default">
<name>Capture Scene</name> <name>Capture Scene</name>
<t> <t>
In order for a Provider's individual Captures to be used In order for a Provider's individual Captures to be used
effectively by a Consumer, the Provider organizes the Captures into effectively by a Consumer, the Provider organizes the Captures into
one or more Capture Scenes, with the structure and contents of one or more Capture Scenes, with the structure and contents of
these Capture Scenes being sent from the Provider to the Consumer these Capture Scenes being sent from the Provider to the Consumer
in the Advertisement.</t> in the Advertisement.</t>
<t> <t>
A Capture Scene is a structure representing a spatial region A Capture Scene is a structure representing a spatial region
containing one or more Capture Devices, each capturing media containing one or more Capture Devices, each capturing media
skipping to change at line 1484 skipping to change at line 1505
A Capture Scene MAY (and typically will) include more than one type A Capture Scene MAY (and typically will) include more than one type
of media. For example, a Capture Scene can include several Capture of media. For example, a Capture Scene can include several Capture
Scene Views for Video Captures and several Capture Scene Views for Scene Views for Video Captures and several Capture Scene Views for
Audio Captures. A particular Capture MAY be included in more than Audio Captures. A particular Capture MAY be included in more than
one Capture Scene View.</t> one Capture Scene View.</t>
<t> <t>
A Provider MAY express spatial relationships between Captures that A Provider MAY express spatial relationships between Captures that
are included in the same Capture Scene. However, there is no are included in the same Capture Scene. However, there is no
spatial relationship between Media Captures from different Capture spatial relationship between Media Captures from different Capture
Scenes. In other words, Capture Scenes each use their own spatial Scenes. In other words, Capture Scenes each use their own spatial
measurement system as outlined in <xref target="section-6" format="de fault"/>.</t> measurement system as outlined in <xref target="s-6" format="default" />.</t>
<t> <t>
A Provider arranges Captures in a Capture Scene to help the A Provider arranges Captures in a Capture Scene to help the
Consumer choose which captures it wants to render. The Capture Consumer choose which captures it wants to render. The Capture
Scene Views in a Capture Scene are different alternatives the Scene Views in a Capture Scene are different alternatives the
Provider is suggesting for representing the Capture Scene. Each Provider is suggesting for representing the Capture Scene. Each
Capture Scene View is given an advertisement-unique identity. The Capture Scene View is given an advertisement-unique identity. The
order of Capture Scene Views within a Capture Scene has no order of Capture Scene Views within a Capture Scene has no
significance. The Media Consumer can choose to receive all Media significance. The Media Consumer can choose to receive all Media
Captures from one Capture Scene View for each media type (e.g., Captures from one Capture Scene View for each media type (e.g.,
audio and video), or it can pick and choose Media Captures audio and video), or it can pick and choose Media Captures
regardless of how the Provider arranges them in Capture Scene regardless of how the Provider arranges them in Capture Scene
Views. Different Capture Scene Views of the same media type are Views. Different Capture Scene Views of the same media type are
not necessarily mutually exclusive alternatives. Also note that not necessarily mutually exclusive alternatives. Also note that
the presence of multiple Capture Scene Views (with potentially the presence of multiple Capture Scene Views (with potentially
multiple encoding options in each view) in a given Capture Scene multiple encoding options in each view) in a given Capture Scene
does not necessarily imply that a Provider is able to serve all the does not necessarily imply that a Provider is able to serve all the
associated media simultaneously (although the construction of such associated media simultaneously (although the construction of such
an over-rich Capture Scene is probably not sensible in many cases). an over-rich Capture Scene is probably not sensible in many cases).
What a Provider can send simultaneously is determined through the What a Provider can send simultaneously is determined through the
Simultaneous Transmission Set mechanism, described in <xref target="s ection-8" format="default"/>.</t> Simultaneous Transmission Set mechanism, described in <xref target="s -8" format="default"/>.</t>
<t> <t>
Captures within the same Capture Scene View MUST be of the same Captures within the same Capture Scene View MUST be of the same
media type -- it is not possible to mix audio and video captures in media type -- it is not possible to mix audio and video captures in
the same Capture Scene View, for instance. The Provider MUST be the same Capture Scene View, for instance. The Provider MUST be
capable of encoding and sending all Captures (that have an encoding capable of encoding and sending all Captures (that have an encoding
group) in a single Capture Scene View simultaneously. The order of group) in a single Capture Scene View simultaneously. The order of
Captures within a Capture Scene View has no significance. A Captures within a Capture Scene View has no significance. A
Consumer can decide to receive all the Captures in a single Capture Consumer can decide to receive all the Captures in a single Capture
Scene View, but a Consumer could also decide to receive just a Scene View, but a Consumer could also decide to receive just a
subset of those captures. A Consumer can also decide to receive subset of those captures. A Consumer can also decide to receive
Captures from different Capture Scene Views, all subject to the Captures from different Capture Scene Views, all subject to the
constraints set by Simultaneous Transmission Sets, as discussed in constraints set by Simultaneous Transmission Sets, as discussed in
<xref target="section-8" format="default"/>.</t> <xref target="s-8" format="default"/>.</t>
<t> <t>
When a Provider advertises a Capture Scene with multiple CSVs, it When a Provider advertises a Capture Scene with multiple CSVs, it
is essentially signaling that there are multiple representations of is essentially signaling that there are multiple representations of
the same Capture Scene available. In some cases, these multiple the same Capture Scene available. In some cases, these multiple
views would be used simultaneously (for instance, a "video view" and views would be used simultaneously (for instance, a "video view" and
an "audio view"). In some cases, the views would conceptually be an "audio view"). In some cases, the views would conceptually be
alternatives (for instance, a view consisting of three Video alternatives (for instance, a view consisting of three Video
Captures covering the whole room versus a view consisting of just a Captures covering the whole room versus a view consisting of just a
single Video Capture covering only the center of a room). In this single Video Capture covering only the center of a room). In this
latter example, one sensible choice for a Consumer would be to latter example, one sensible choice for a Consumer would be to
skipping to change at line 1555 skipping to change at line 1576
rendering purposes is accomplished through use of their Area of rendering purposes is accomplished through use of their Area of
Capture attributes. The second view (MCC3) and the third view Capture attributes. The second view (MCC3) and the third view
(VC4) are alternative representations of the same room's video, (VC4) are alternative representations of the same room's video,
which might be better suited to some Consumers' rendering which might be better suited to some Consumers' rendering
capabilities. The inclusion of the Audio Capture in the same capabilities. The inclusion of the Audio Capture in the same
Capture Scene indicates that AC0 is associated with all of those Capture Scene indicates that AC0 is associated with all of those
Video Captures, meaning it comes from the same spatial region. Video Captures, meaning it comes from the same spatial region.
Therefore, if audio were to be rendered at all, this audio would be Therefore, if audio were to be rendered at all, this audio would be
the correct choice, irrespective of which Video Captures were the correct choice, irrespective of which Video Captures were
chosen.</t> chosen.</t>
<section anchor="section-7.3.1" numbered="true" toc="default"> <section anchor="s-7.3.1" numbered="true" toc="default">
<name>Capture Scene Attributes</name> <name>Capture Scene Attributes</name>
<t> <t>
Capture Scene Attributes can be applied to Capture Scenes as well Capture Scene Attributes can be applied to Capture Scenes as well
as to individual media captures. Attributes specified at this as to individual media captures. Attributes specified at this
level apply to all constituent Captures. Capture Scene attributes level apply to all constituent Captures. Capture Scene attributes
include the following:</t> include the following:</t>
<ul spacing="normal"> <ul spacing="normal">
<li>Human-readable description of the Capture Scene, which c ould <li>Human-readable description of the Capture Scene, which c ould
be in multiple languages;</li> be in multiple languages;</li>
<li>xCard scene information</li> <li>xCard scene information</li>
<li>Scale information ("Millimeters", "Unknown Scale", "No S cale"), as <li>Scale information ("Millimeters", "Unknown Scale", "No S cale"), as
described in <xref target="section-6" format="default"/>.</li> described in <xref target="s-6" format="default"/>.</li>
</ul> </ul>
<section anchor="section-7.3.1.1" numbered="true" toc="default "> <section anchor="s-7.3.1.1" numbered="true" toc="default">
<name>Scene Information</name> <name>Scene Information</name>
<t> <t>
The Scene information attribute provides information regarding the The Scene information attribute provides information regarding the
Capture Scene rather than individual participants. The Provider Capture Scene rather than individual participants. The Provider
may gather the information automatically or manually from a may gather the information automatically or manually from a
variety of sources. The scene information attribute allows a variety of sources. The scene information attribute allows a
Provider to indicate information such as organizational or Provider to indicate information such as organizational or
geographic information allowing a Consumer to determine which geographic information allowing a Consumer to determine which
Capture Scenes are of interest in order to then perform Capture Capture Scenes are of interest in order to then perform Capture
selection. It also allows a Consumer to render information selection. It also allows a Consumer to render information
regarding the Scene or to use it for further processing.</t> regarding the Scene or to use it for further processing.</t>
<t> <t>
As per <xref target="section-7.1.1.10" format="default"/>, the xCard format is used to convey this As per <xref target="s-7.1.1.10" format="default"/>, the xCard format is used to convey this
information and the Provider may supply a minimal set of information and the Provider may supply a minimal set of
information or a larger set of information.</t> information or a larger set of information.</t>
<t> <t>
In order to keep CLUE messages compact the Provider SHOULD use a In order to keep CLUE messages compact the Provider SHOULD use a
URI to point to any LOGO, PHOTO, or SOUND contained in the xCARD URI to point to any LOGO, PHOTO, or SOUND contained in the xCARD
rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE rather than transmitting the LOGO, PHOTO, or SOUND data in a CLUE
message.</t> message.</t>
</section> </section>
</section> </section>
<section anchor="section-7.3.2" numbered="true" toc="default"> <section anchor="s-7.3.2" numbered="true" toc="default">
<name>Capture Scene View Attributes</name> <name>Capture Scene View Attributes</name>
<t> <t>
A Capture Scene can include one or more Capture Scene Views in A Capture Scene can include one or more Capture Scene Views in
addition to the Capture-Scene-wide attributes described above. addition to the Capture-Scene-wide attributes described above.
Capture Scene View attributes apply to the Capture Scene View as a Capture Scene View attributes apply to the Capture Scene View as a
whole, i.e., to all Captures that are part of the Capture Scene whole, i.e., to all Captures that are part of the Capture Scene
View.</t> View.</t>
<t>Capture Scene View attributes include the following: <t>Capture Scene View attributes include the following:
</t> </t>
<ul spacing="normal"> <ul spacing="normal">
<li>A human-readable description (which could be in multiple <li>A human-readable description (which could be in multiple
languages) of the Capture Scene View.</li> languages) of the Capture Scene View.</li>
</ul> </ul>
</section> </section>
</section> </section>
<section anchor="section-7.4" numbered="true" toc="default"> <section anchor="s-7.4" numbered="true" toc="default">
<name>Global View List</name> <name>Global View List</name>
<t> <t>
An Advertisement can include an optional Global View list. Each An Advertisement can include an optional Global View list. Each
item in this list is a Global View. The Provider can include item in this list is a Global View. The Provider can include
multiple Global Views, to allow a Consumer to choose sets of multiple Global Views, to allow a Consumer to choose sets of
captures appropriate to its capabilities or application. The captures appropriate to its capabilities or application. The
choice of how to make these suggestions in the Global View list choice of how to make these suggestions in the Global View list
for what represents all the scenes for which the Provider can send for what represents all the scenes for which the Provider can send
media is up to the Provider. This is very similar to how each CSV media is up to the Provider. This is very similar to how each CSV
represents a particular scene.</t> represents a particular scene.</t>
skipping to change at line 1681 skipping to change at line 1702
. | <---------' | . . | <---------' | .
. | | | (v) = video . . | | | (v) = video .
. | CSV6 (a)<-----------' (a) = audio . . | CSV6 (a)<-----------' (a) = audio .
. | | . . | | .
. +--------------+ . . +--------------+ .
`......................................................' `......................................................'
]]></artwork> ]]></artwork>
</figure> </figure>
</section> </section>
</section> </section>
<section anchor="section-8" numbered="true" toc="default"> <section anchor="s-8" numbered="true" toc="default">
<name>Simultaneous Transmission Set Constraints</name> <name>Simultaneous Transmission Set Constraints</name>
<t> <t>
In many practical cases, a Provider has constraints or limitations In many practical cases, a Provider has constraints or limitations
on its ability to send Captures simultaneously. One type of on its ability to send Captures simultaneously. One type of
limitation is caused by the physical limitations of capture limitation is caused by the physical limitations of capture
mechanisms; these constraints are represented by a Simultaneous mechanisms; these constraints are represented by a Simultaneous
Transmission Set. The second type of limitation reflects the Transmission Set. The second type of limitation reflects the
encoding resources available, such as bandwidth or video encoding encoding resources available, such as bandwidth or video encoding
throughput (macroblocks/second). This type of constraint is throughput (macroblocks/second). This type of constraint is
captured by Individual Encodings and Encoding Groups, discussed captured by Individual Encodings and Encoding Groups, discussed
skipping to change at line 1734 skipping to change at line 1755
<t> <t>
In this example, the two Simultaneous Transmission Sets are shown in In this example, the two Simultaneous Transmission Sets are shown in
<xref target="ref-two-simultaneous-transmission-sets" format="default "/>. If a Provider advertises one or more mutually exclusive <xref target="ref-two-simultaneous-transmission-sets" format="default "/>. If a Provider advertises one or more mutually exclusive
Simultaneous Transmission Sets, then, for each media type, the Simultaneous Transmission Sets, then, for each media type, the
Consumer MUST ensure that it chooses Media Captures that lie wholly Consumer MUST ensure that it chooses Media Captures that lie wholly
within one of those Simultaneous Transmission Sets.</t> within one of those Simultaneous Transmission Sets.</t>
<table anchor="ref-two-simultaneous-transmission-sets" align="cent er"> <table anchor="ref-two-simultaneous-transmission-sets" align="cent er">
<name>Two Simultaneous Transmission Sets</name> <name>Two Simultaneous Transmission Sets</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Simultaneous Sets</th> <th align="left">Simultaneous Sets</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">{VC0, VC1, VC2}</td> <td align="left">{VC0, VC1, VC2}</td>
</tr> </tr>
<tr> <tr>
<td align="left">{VC0, VC3, VC2}</td> <td align="left">{VC0, VC3, VC2}</td>
</tr> </tr>
</tbody> </tbody>
skipping to change at line 1778 skipping to change at line 1799
Scene. Likewise, if there are no Simultaneous Transmission Sets Scene. Likewise, if there are no Simultaneous Transmission Sets
and there is a Global View list, then the Provider MUST be able to and there is a Global View list, then the Provider MUST be able to
simultaneously provide all the Captures from any particular Global simultaneously provide all the Captures from any particular Global
View (of each media type) from the Global View list.</t> View (of each media type) from the Global View list.</t>
<t> <t>
If an Advertisement includes multiple Capture Scene Views in a If an Advertisement includes multiple Capture Scene Views in a
Capture Scene, then the Consumer MAY choose one Capture Scene View Capture Scene, then the Consumer MAY choose one Capture Scene View
for each media type, or it MAY choose individual Captures based on th e for each media type, or it MAY choose individual Captures based on th e
Simultaneous Transmission Sets.</t> Simultaneous Transmission Sets.</t>
</section> </section>
<section anchor="section-9" numbered="true" toc="default"> <section anchor="s-9" numbered="true" toc="default">
<name>Encodings</name> <name>Encodings</name>
<t> <t>
Individual encodings and encoding groups are CLUE's mechanisms Individual encodings and encoding groups are CLUE's mechanisms
allowing a Provider to signal its limitations for sending Captures, allowing a Provider to signal its limitations for sending Captures,
or combinations of Captures, to a Consumer. Consumers can map the or combinations of Captures, to a Consumer. Consumers can map the
Captures they want to receive onto the Encodings, with the encoding Captures they want to receive onto the Encodings, with the encoding
parameters they want. As for the relationship between the CLUE-speci fied mechanisms based on Encodings and the SIP offer/answer parameters they want. As for the relationship between the CLUE-speci fied mechanisms based on Encodings and the SIP offer/answer
exchange, please refer to <xref target="section-5" format="default"/> exchange, please refer to <xref target="s-5" format="default"/>.</t>
.</t> <section anchor="s-9.1" numbered="true" toc="default">
<section anchor="section-9.1" numbered="true" toc="default">
<name>Individual Encodings</name> <name>Individual Encodings</name>
<t> <t>
An Individual Encoding represents a way to encode a Media Capture An Individual Encoding represents a way to encode a Media Capture
as a Capture Encoding, to be sent as an encoded media stream from as a Capture Encoding, to be sent as an encoded media stream from
the Provider to the Consumer. An Individual Encoding has a set of the Provider to the Consumer. An Individual Encoding has a set of
parameters characterizing how the media is encoded.</t> parameters characterizing how the media is encoded.</t>
<t> <t>
Different media types have different parameters, and different Different media types have different parameters, and different
encoding algorithms may have different parameters. An Individual encoding algorithms may have different parameters. An Individual
Encoding can be assigned to at most one Capture Encoding at any Encoding can be assigned to at most one Capture Encoding at any
skipping to change at line 1816 skipping to change at line 1837
<ul spacing="compact"> <ul spacing="compact">
<li>Maximum bandwidth;</li> <li>Maximum bandwidth;</li>
<li>Maximum picture size in pixels;</li> <li>Maximum picture size in pixels;</li>
<li>Maximum number of pixels to be processed per second;</li> <li>Maximum number of pixels to be processed per second;</li>
</ul> </ul>
<t> <t>
The bandwidth parameter is the only one that specifically relates The bandwidth parameter is the only one that specifically relates
to a CLUE Advertisement, as it can be further constrained by the to a CLUE Advertisement, as it can be further constrained by the
maximum group bandwidth in an Encoding Group.</t> maximum group bandwidth in an Encoding Group.</t>
</section> </section>
<section anchor="section-9.2" numbered="true" toc="default"> <section anchor="s-9.2" numbered="true" toc="default">
<name>Encoding Group</name> <name>Encoding Group</name>
<t> <t>
An Encoding Group includes a set of one or more Individual An Encoding Group includes a set of one or more Individual
Encodings, and parameters that apply to the group as a whole. By Encodings, and parameters that apply to the group as a whole. By
grouping multiple individual Encodings together, an Encoding Group grouping multiple individual Encodings together, an Encoding Group
describes additional constraints on bandwidth for the group. A describes additional constraints on bandwidth for the group. A
single Encoding Group MAY refer to Encodings for different media single Encoding Group MAY refer to Encodings for different media
types.</t> types.</t>
<t>The Encoding Group data structure contains: <t>The Encoding Group data structure contains:
skipping to change at line 1886 skipping to change at line 1907
--> -->
<t>While a typical three-codec/display system might have one Encoding <t>While a typical three-codec/display system might have one Encoding
Group per "codec box" (physical codec, connected to one camera and Group per "codec box" (physical codec, connected to one camera and
one screen), there are many possibilities for the number of one screen), there are many possibilities for the number of
Encoding Groups a Provider may be able to offer and for the Encoding Groups a Provider may be able to offer and for the
encoding values in each Encoding Group.</t> encoding values in each Encoding Group.</t>
<t> <t>
There is no requirement for all Encodings within an Encoding Group There is no requirement for all Encodings within an Encoding Group
to be instantiated at the same time.</t> to be instantiated at the same time.</t>
</section> </section>
<section anchor="section-9.3" numbered="true" toc="default"> <section anchor="s-9.3" numbered="true" toc="default">
<name>Associating Captures with Encoding Groups</name> <name>Associating Captures with Encoding Groups</name>
<t> <t>
Each Media Capture, including MCCs, MAY be associated with one Each Media Capture, including MCCs, MAY be associated with one
Encoding Group. To be eligible for configuration, a Media Capture Encoding Group. To be eligible for configuration, a Media Capture
MUST be associated with one Encoding Group, which is used to MUST be associated with one Encoding Group, which is used to
instantiate that Capture into a Capture Encoding. When an MCC is instantiate that Capture into a Capture Encoding. When an MCC is
configured, all the Media Captures referenced by the MCC will appear configured, all the Media Captures referenced by the MCC will appear
in the Capture Encoding according to the attributes of the chosen in the Capture Encoding according to the attributes of the chosen
encoding of the MCC. This allows an Advertiser to specify encoding encoding of the MCC. This allows an Advertiser to specify encoding
attributes associated with the Media Captures without the need to attributes associated with the Media Captures without the need to
skipping to change at line 1908 skipping to change at line 1929
<t> <t>
If an Encoding Group is assigned to a Media Capture referenced by If an Encoding Group is assigned to a Media Capture referenced by
the MCC, it indicates that this Capture may also have an individual the MCC, it indicates that this Capture may also have an individual
Capture Encoding.</t> Capture Encoding.</t>
<t>For example: <t>For example:
</t> </t>
<table anchor="ref-example-usage-of-encoding-with-mcc-and-source -captures" align="center"> <table anchor="ref-example-usage-of-encoding-with-mcc-and-source -captures" align="center">
<name>Example Usage of Encoding with MCC and Source Captures</ name> <name>Example Usage of Encoding with MCC and Source Captures</ name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left">Capture Scene #1</th>
<th align="left"> </th> <th align="left"> </th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">VC1</td> <td align="left">VC1</td>
<td align="left">EncodeGroupID=1</td> <td align="left">EncodeGroupID=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">VC2</td> <td align="left">VC2</td>
skipping to change at line 1954 skipping to change at line 1975
individual Encodings in the group. The actual number of Capture individual Encodings in the group. The actual number of Capture
Encodings used at any time MAY be less than this maximum. Any of Encodings used at any time MAY be less than this maximum. Any of
the Captures that use a particular Encoding Group can be encoded the Captures that use a particular Encoding Group can be encoded
according to any of the Individual Encodings in the group.</t> according to any of the Individual Encodings in the group.</t>
<t> <t>
It is a protocol conformance requirement that the Encoding Groups It is a protocol conformance requirement that the Encoding Groups
MUST allow all the Captures in a particular Capture Scene View to MUST allow all the Captures in a particular Capture Scene View to
be used simultaneously.</t> be used simultaneously.</t>
</section> </section>
</section> </section>
<section anchor="section-10" numbered="true" toc="default"> <section anchor="s-10" numbered="true" toc="default">
<name>Consumer's Choice of Streams to Receive from the Provider</n ame> <name>Consumer's Choice of Streams to Receive from the Provider</n ame>
<t> <t>
After receiving the Provider's Advertisement message (which includes After receiving the Provider's Advertisement message (which includes
media captures and associated constraints), the Consumer composes media captures and associated constraints), the Consumer composes
its reply to the Provider in the form of a Configure message. The its reply to the Provider in the form of a Configure message. The
Consumer is free to use the information in the Advertisement as it Consumer is free to use the information in the Advertisement as it
chooses, but there are a few obviously sensible design choices, chooses, but there are a few obviously sensible design choices,
which are outlined below.</t> which are outlined below.</t>
<t> <t>
If multiple Providers connect to the same Consumer (i.e., in an If multiple Providers connect to the same Consumer (i.e., in an
skipping to change at line 2054 skipping to change at line 2075
recently received Advertisement. The Consumer can send a Configure recently received Advertisement. The Consumer can send a Configure
either in response to a new Advertisement from the Provider or on either in response to a new Advertisement from the Provider or on
its own, for example, because of a local change in conditions its own, for example, because of a local change in conditions
(people leaving the room, connectivity changes, multipoint related (people leaving the room, connectivity changes, multipoint related
considerations).</t> considerations).</t>
<t> <t>
When choosing which Media Streams to receive from the Provider, and When choosing which Media Streams to receive from the Provider, and
the encoding characteristics of those Media Streams, the Consumer the encoding characteristics of those Media Streams, the Consumer
advantageously takes several things into account: its local advantageously takes several things into account: its local
preference, simultaneity restrictions, and encoding limits.</t> preference, simultaneity restrictions, and encoding limits.</t>
<section anchor="section-10.1" numbered="true" toc="default"> <section anchor="s-10.1" numbered="true" toc="default">
<name>Local Preference</name> <name>Local Preference</name>
<t> <t>
A variety of local factors influence the Consumer's choice of A variety of local factors influence the Consumer's choice of
Media Streams to be received from the Provider:</t> Media Streams to be received from the Provider:</t>
<ul spacing="normal"> <ul spacing="normal">
<li>If the Consumer is an Endpoint, it is likely that it would <li>If the Consumer is an Endpoint, it is likely that it would
choose, where possible, to receive video and audio Captures that choose, where possible, to receive video and audio Captures that
match the number of display devices and audio system it has.</li> match the number of display devices and audio system it has.</li>
<li>If the Consumer is an MCU, it may choose to receive loudes t <li>If the Consumer is an MCU, it may choose to receive loudes t
speaker streams (in order to perform its own media composition) speaker streams (in order to perform its own media composition)
and avoid pre-composed video Captures.</li> and avoid pre-composed video Captures.</li>
<li>User choice (for instance, selection of a new layout) may result <li>User choice (for instance, selection of a new layout) may result
in a different set of Captures, or different encoding in a different set of Captures, or different encoding
characteristics, being required by the Consumer.</li> characteristics, being required by the Consumer.</li>
</ul> </ul>
</section> </section>
<section anchor="section-10.2" numbered="true" toc="default"> <section anchor="s-10.2" numbered="true" toc="default">
<name>Physical Simultaneity Restrictions</name> <name>Physical Simultaneity Restrictions</name>
<t> <t>
Often there are physical simultaneity constraints of the Provider Often there are physical simultaneity constraints of the Provider
that affect the Provider's ability to simultaneously send all of that affect the Provider's ability to simultaneously send all of
the captures the Consumer would wish to receive. For instance, an the captures the Consumer would wish to receive. For instance, an
MCU, when connected to a multi-camera room system, might prefer to MCU, when connected to a multi-camera room system, might prefer to
receive both individual video streams of the people present in the receive both individual video streams of the people present in the
room and an overall view of the room from a single camera. Some room and an overall view of the room from a single camera. Some
Endpoint systems might be able to provide both of these sets of Endpoint systems might be able to provide both of these sets of
streams simultaneously, whereas others might not (if the overall streams simultaneously, whereas others might not (if the overall
room view were produced by changing the optical zoom level on the room view were produced by changing the optical zoom level on the
center camera, for instance).</t> center camera, for instance).</t>
</section> </section>
<section anchor="section-10.3" numbered="true" toc="default"> <section anchor="s-10.3" numbered="true" toc="default">
<name>Encoding and Encoding Group Limits</name> <name>Encoding and Encoding Group Limits</name>
<t> <t>
Each of the Provider's encoding groups has limits on bandwidth, Each of the Provider's encoding groups has limits on bandwidth,
and the constituent potential encodings have limits on the and the constituent potential encodings have limits on the
bandwidth, computational complexity, video frame rate, and bandwidth, computational complexity, video frame rate, and
resolution that can be provided. When choosing the Captures to be resolution that can be provided. When choosing the Captures to be
received from a Provider, a Consumer device MUST ensure that the received from a Provider, a Consumer device MUST ensure that the
encoding characteristics requested for each individual Capture encoding characteristics requested for each individual Capture
fits within the capability of the encoding it is being configured fits within the capability of the encoding it is being configured
to use, as well as ensuring that the combined encoding to use, as well as ensuring that the combined encoding
characteristics for Captures fit within the capabilities of their characteristics for Captures fit within the capabilities of their
associated encoding groups. In some cases, this could cause an associated encoding groups. In some cases, this could cause an
otherwise "preferred" choice of capture encodings to be passed otherwise "preferred" choice of capture encodings to be passed
over in favor of different Capture Encodings -- for instance, if a over in favor of different Capture Encodings -- for instance, if a
set of three Captures could only be provided at a low resolution set of three Captures could only be provided at a low resolution
then a three screen device could switch to favoring a single, then a three screen device could switch to favoring a single,
higher quality, Capture Encoding.</t> higher quality, Capture Encoding.</t>
</section> </section>
</section> </section>
<section anchor="section-11" numbered="true" toc="default"> <section anchor="s-11" numbered="true" toc="default">
<name>Extensibility</name> <name>Extensibility</name>
<t> <t>
One important characteristics of the Framework is its One important characteristics of the Framework is its
extensibility. The standard for interoperability and handling extensibility. The standard for interoperability and handling
multiple streams must be future-proof. The framework itself is multiple streams must be future-proof. The framework itself is
inherently extensible through expanding the data model types. For inherently extensible through expanding the data model types. For
example:</t> example:</t>
<ul spacing="normal"> <ul spacing="normal">
<li>Adding more types of media, such as telemetry, can done by <li>Adding more types of media, such as telemetry, can done by
defining additional types of Captures in addition to audio and defining additional types of Captures in addition to audio and
video.</li> video.</li>
<li>Adding new functionalities, such as 3-D video Captures, may <li>Adding new functionalities, such as 3-D video Captures, may
require additional attributes describing the Captures.</li> require additional attributes describing the Captures.</li>
</ul> </ul>
<t> <t>
The infrastructure is designed to be extended rather than The infrastructure is designed to be extended rather than
requiring new infrastructure elements. Extension comes through requiring new infrastructure elements. Extension comes through
adding to defined types.</t> adding to defined types.</t>
</section> </section>
<section anchor="section-12" numbered="true" toc="default"> <section anchor="s-12" numbered="true" toc="default">
<name>Examples - Using the Framework (Informative)</name> <name>Examples - Using the Framework (Informative)</name>
<t> <t>
This section gives some examples, first from the point of view of This section gives some examples, first from the point of view of
the Provider, then the Consumer, then some multipoint scenarios.</t> the Provider, then the Consumer, then some multipoint scenarios.</t>
<section anchor="section-12.1" numbered="true" toc="default"> <section anchor="s-12.1" numbered="true" toc="default">
<name>Provider Behavior</name> <name>Provider Behavior</name>
<t> <t>
This section shows some examples in more detail of how a Provider This section shows some examples in more detail of how a Provider
can use the framework to represent a typical case for telepresence can use the framework to represent a typical case for telepresence
rooms. First, an endpoint is illustrated, then an MCU case is rooms. First, an endpoint is illustrated, then an MCU case is
shown.</t> shown.</t>
<section anchor="section-12.1.1" numbered="true" toc="default"> <section anchor="s-12.1.1" numbered="true" toc="default">
<name>Three Screen Endpoint Provider</name> <name>Three Screen Endpoint Provider</name>
<t> <t>
Consider an Endpoint with the following description:</t> Consider an Endpoint with the following description:</t>
<t> <t>
Three cameras, three displays, and a six-person table</t> Three cameras, three displays, and a six-person table</t>
<ul spacing="normal"> <ul spacing="normal">
<li>Each camera can provide one Capture for each 1/3-section of the <li>Each camera can provide one Capture for each 1/3-section of the
table.</li> table.</li>
<li>A single Capture representing the active speaker can be provided <li>A single Capture representing the active speaker can be provided
(voice-activity-based camera selection to a given encoder input (voice-activity-based camera selection to a given encoder input
port implemented locally in the Endpoint).</li> port implemented locally in the Endpoint).</li>
<li>A single Capture representing the active speaker with th e other <li>A single Capture representing the active speaker with th e other
two Captures shown picture in picture (PiP) within the stream can two Captures shown picture in picture (PiP) within the stream can
be provided (again, implemented inside the endpoint).</li> be provided (again, implemented inside the endpoint).</li>
<li>A Capture showing a zoomed out view of all six seats in the room <li>A Capture showing a zoomed out view of all six seats in the room
can be provided.</li> can be provided.</li>
</ul> </ul>
<t> <t>
The video and audio Captures for this Endpoint can be described as The video and audio Captures for this Endpoint can be described as
follows.</t> follows.</t>
<!--[rfced]
<t> <t>
Video Captures: Video Captures:
</t> </t>
<dl newline="false" spacing="normal" indent="6"> <dl newline="false" spacing="normal" indent="6">
<dt>VC0</dt> <dt>VC0</dt>
<dd>(the left camera stream), encoding group=EG0, view=table </dd> <dd>(the left camera stream), encoding group=EG0, view=table </dd>
<dt>VC1</dt> <dt>VC1</dt>
<dd>(the center camera stream), encoding group=EG1, view=tab le</dd> <dd>(the center camera stream), encoding group=EG1, view=tab le</dd>
<dt>VC2</dt> <dt>VC2</dt>
<dd>(the right camera stream), encoding group=EG2, view=tabl e</dd> <dd>(the right camera stream), encoding group=EG2, view=tabl e</dd>
skipping to change at line 2358 skipping to change at line 2377
]]></artwork> ]]></artwork>
</figure> </figure>
<t> <t>
Capture Scenes:</t> Capture Scenes:</t>
<t> <t>
The following table represents the Capture Scenes for this The following table represents the Capture Scenes for this
Provider. Recall that a Capture Scene is composed of alternative Provider. Recall that a Capture Scene is composed of alternative
Capture Scene Views covering the same spatial region. Capture Capture Scene Views covering the same spatial region. Capture
Scene #1 is for the main people captures, and Capture Scene #2 is Scene #1 is for the main people captures, and Capture Scene #2 is
for presentation.</t> for presentation.</t>
<t>Each row in the table is a separate Capture Scene View</t> <t>Each row in the table is a separate Capture Scene View.</t>
<table align="center"> <table align="center">
<thead> <name>Example Capture Scene Views</name>
<tbody>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left"> Capture Scene #1</th>
</tr> </tr>
</thead>
<tbody>
<tr> <tr>
<td align="left">VC0, VC1, VC2</td> <td align="left">VC0, VC1, VC2</td>
</tr> </tr>
<tr> <tr>
<td align="left">MCC3</td> <td align="left">MCC3</td>
</tr> </tr>
<tr> <tr>
<td align="left">MCC4</td> <td align="left">MCC4</td>
</tr> </tr>
<tr> <tr>
<td align="left">VC5</td> <td align="left">VC5</td>
</tr> </tr>
<tr> <tr>
<td align="left">AC0, AC1, AC2</td> <td align="left">AC0, AC1, AC2</td>
</tr> </tr>
<tr> <tr>
<td align="left">AC3</td> <td align="left">AC3</td>
</tr> </tr>
</tbody> </tbody>
</table>
<table anchor="Table7" align="center"> <tbody>
<name>Example Capture Scene Views</name>
<thead>
<tr> <tr>
<th align="left"> Capture Scene #2</th> <th align="left"> Capture Scene #2</th>
</tr> </tr>
</thead>
<tbody>
<tr> <tr>
<td align="left">VC6</td> <td align="left">VC6</td>
</tr> </tr>
<tr> <tr>
<td align="left">AC4</td> <td align="left">AC4</td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t> <t>
Different Capture Scenes are distinct from each other and do not Different Capture Scenes are distinct from each other and do not
skipping to change at line 2432 skipping to change at line 2447
to receive, partially based on how many streams it can simultaneously to receive, partially based on how many streams it can simultaneously
receive. A consumer that can receive three video streams would receive. A consumer that can receive three video streams would
probably prefer to receive the first view of Capture Scene #1 probably prefer to receive the first view of Capture Scene #1
(VC0, VC1, and VC2) and not receive the other views. A consumer that (VC0, VC1, and VC2) and not receive the other views. A consumer that
can receive only one video stream would probably choose one of the can receive only one video stream would probably choose one of the
other views.</t> other views.</t>
<t> <t>
If the consumer can receive a presentation stream too, it would If the consumer can receive a presentation stream too, it would
also choose to receive the only view from Capture Scene #2 (VC6).</t> also choose to receive the only view from Capture Scene #2 (VC6).</t>
</section> </section>
<section anchor="section-12.1.2" numbered="true" toc="default"> <section anchor="s-12.1.2" numbered="true" toc="default">
<name>Encoding Group Example</name> <name>Encoding Group Example</name>
<t> <t>
This is an example of an Encoding Group to illustrate how it can This is an example of an Encoding Group to illustrate how it can
express dependencies between Encodings. The information below express dependencies between Encodings. The information below
about Encodings is a summary of what would be conveyed in SDP, not about Encodings is a summary of what would be conveyed in SDP, not
directly in the CLUE Advertisement.</t> directly in the CLUE Advertisement.</t>
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
encodeGroupID=EG0 maxGroupBandwidth=6000000 encodeGroupID=EG0 maxGroupBandwidth=6000000
encodeID=VIDENC0, maxWidth=1920, maxHeight=1088, encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000 maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
skipping to change at line 2476 skipping to change at line 2491
encodeID=VIDENC0, maxWidth=1920, maxHeight=1088, encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000 maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
encodeID=VIDENC1, maxWidth=1920, maxHeight=1088, encodeID=VIDENC1, maxWidth=1920, maxHeight=1088,
maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000 maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
encodeGroupID=EG1 maxGroupBandwidth=500000 encodeGroupID=EG1 maxGroupBandwidth=500000
encodeID=AUDENC0, maxBandwidth=96000 encodeID=AUDENC0, maxBandwidth=96000
encodeID=AUDENC1, maxBandwidth=96000 encodeID=AUDENC1, maxBandwidth=96000
encodeID=AUDENC2, maxBandwidth=96000 encodeID=AUDENC2, maxBandwidth=96000
]]></artwork> ]]></artwork>
</section> </section>
<section anchor="section-12.1.3" numbered="true" toc="default"> <section anchor="s-12.1.3" numbered="true" toc="default">
<name>The MCU Case</name> <name>The MCU Case</name>
<t> <t>
This section shows how an MCU might express its Capture Scenes, This section shows how an MCU might express its Capture Scenes,
intending to offer different choices for consumers that can handle intending to offer different choices for consumers that can handle
different numbers of streams. Each MCC is for video. A single different numbers of streams. Each MCC is for video. A single
Audio Capture is provided for all single and multi-screen Audio Capture is provided for all single and multi-screen
configurations that can be associated (e.g., lip-synced) with any configurations that can be associated (e.g., lip-synced) with any
combination of Video Captures (the MCCs) at the consumer.</t> combination of Video Captures (the MCCs) at the consumer.</t>
<table anchor="ref-mcu-main-capture-scenes" align="center"> <table anchor="ref-mcu-main-capture-scenes" align="center">
<name>MCU Main Capture Scenes</name> <name>MCU Main Capture Scenes</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left">Capture Scene #1</th>
<th align="left"/> <th align="left"/>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">MCC</td> <td align="left">MCC</td>
<td align="left">for a single-screen consumer</td> <td align="left">for a single-screen consumer</td>
</tr> </tr>
<tr> <tr>
<td align="left">MCC1, MCC2</td> <td align="left">MCC1, MCC2</td>
skipping to change at line 2543 skipping to change at line 2558
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t> <t>
If/when a presentation stream becomes active within the conference, If/when a presentation stream becomes active within the conference,
the MCU might re-advertise the available media as:</t> the MCU might re-advertise the available media as:</t>
<table anchor="ref-mcu-presentation-capture-scene" align="cent er"> <table anchor="ref-mcu-presentation-capture-scene" align="cent er">
<name>MCU Presentation Capture Scene</name> <name>MCU Presentation Capture Scene</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #2</th> <th align="left">Capture Scene #2</th>
<th align="left"> Note</th> <th align="left">Note</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">VC10</td> <td align="left">VC10</td>
<td align="left">Video capture for presentation</td> <td align="left">Video capture for presentation</td>
</tr> </tr>
<tr> <tr>
<td align="left">AC1</td> <td align="left">AC1</td>
<td align="left">Presentation audio to accompany VC10</t d> <td align="left">Presentation audio to accompany VC10</t d>
skipping to change at line 2568 skipping to change at line 2583
<td align="left"/> <td align="left"/>
</tr> </tr>
<tr> <tr>
<td align="left">CSV(AC1)</td> <td align="left">CSV(AC1)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
</section> </section>
</section> </section>
<section anchor="section-12.2" numbered="true" toc="default"> <section anchor="s-12.2" numbered="true" toc="default">
<name>Media Consumer Behavior</name> <name>Media Consumer Behavior</name>
<t> <t>
This section gives an example of how a Media Consumer might behave This section gives an example of how a Media Consumer might behave
when deciding how to request streams from the three-screen when deciding how to request streams from the three-screen
endpoint described in the previous section.</t> endpoint described in the previous section.</t>
<t> <t>
The receive side of a call needs to balance its requirements The receive side of a call needs to balance its requirements
(based on number of screens and speakers), its decoding capabilities, (based on number of screens and speakers), its decoding capabilities,
available bandwidth, and the provider's capabilities in order available bandwidth, and the provider's capabilities in order
to optimally configure the provider's streams. Typically, it would to optimally configure the provider's streams. Typically, it would
skipping to change at line 2597 skipping to change at line 2612
alternative views in the video Capture Scenes based either on alternative views in the video Capture Scenes based either on
hard-coded preferences or on user choice. Once this choice has been hard-coded preferences or on user choice. Once this choice has been
made, the consumer would then decide how to configure the made, the consumer would then decide how to configure the
provider's encoding groups in order to make best use of the provider's encoding groups in order to make best use of the
available network bandwidth and its own decoding capabilities.</t> available network bandwidth and its own decoding capabilities.</t>
<!--[rfced] We note the use of both "single-screen" and <!--[rfced] We note the use of both "single-screen" and
"one-screen". May we update to use the latter consistently to "one-screen". May we update to use the latter consistently to
match "two-screen" and the like? match "two-screen" and the like?
--> -->
<section anchor="section-12.2.1" numbered="true" toc="default"> <section anchor="s-12.2.1" numbered="true" toc="default">
<name>One-Screen Media Consumer</name> <name>One-Screen Media Consumer</name>
<t> <t>
MCC3, MCC4, and VC5 are all different views by themselves, not MCC3, MCC4, and VC5 are all different views by themselves, not
grouped together in a single view; so, the receiving device should grouped together in a single view; so, the receiving device should
choose between one of those. The choice would come down to choose between one of those. The choice would come down to
whether to see the greatest number of participants simultaneously whether to see the greatest number of participants simultaneously
at roughly equal precedence (VC5), a switched view of just the at roughly equal precedence (VC5), a switched view of just the
loudest region (MCC3), or a switched view with PiPs (MCC4). An loudest region (MCC3), or a switched view with PiPs (MCC4). An
endpoint device with a small amount of knowledge of these endpoint device with a small amount of knowledge of these
differences could offer a dynamic choice of these options, in-call, t o the user.</t> differences could offer a dynamic choice of these options, in-call, t o the user.</t>
</section> </section>
<section anchor="section-12.2.2" numbered="true" toc="default"> <section anchor="s-12.2.2" numbered="true" toc="default">
<name>Two-Screen Media Consumer Configuring the Example</name> <name>Two-Screen Media Consumer Configuring the Example</name>
<t> <t>
Mixing systems with an even number of screens, "2n", and those Mixing systems with an even number of screens, "2n", and those
with "2n+1" cameras (and vice versa) is always likely to be the with "2n+1" cameras (and vice versa) is always likely to be the
problematic case. In this instance, the behavior is likely to be problematic case. In this instance, the behavior is likely to be
determined by whether a "two-screen" system is really a "two-decoder" determined by whether a "two-screen" system is really a "two-decoder"
system, i.e., whether only one received stream can be displayed system, i.e., whether only one received stream can be displayed
per screen or whether more than two streams can be received and per screen or whether more than two streams can be received and
spread across the available screen area. To enumerate three possible spread across the available screen area. To enumerate three possible
behaviors here for the two-screen system when it learns that the far behaviors here for the two-screen system when it learns that the far
skipping to change at line 2644 skipping to change at line 2659
<li>Receive three streams, decode all three, and use control information <li>Receive three streams, decode all three, and use control information
indicating which was the most active to switch between showing indicating which was the most active to switch between showing
the left and center streams (one per screen) and the center and the left and center streams (one per screen) and the center and
right streams.</li> right streams.</li>
</ol> </ol>
<t> <t>
For an endpoint capable of all three methods of working described For an endpoint capable of all three methods of working described
above, again it might be appropriate to offer the user the choice above, again it might be appropriate to offer the user the choice
of display mode.</t> of display mode.</t>
</section> </section>
<section anchor="section-12.2.3" numbered="true" toc="default"> <section anchor="s-12.2.3" numbered="true" toc="default">
<name>Three-Screen Media Consumer Configuring the Example</nam e> <name>Three-Screen Media Consumer Configuring the Example</nam e>
<t> <t>
This is the most straightforward case: the Media Consumer would This is the most straightforward case: the Media Consumer would
look to identify a set of streams to receive that best matched its look to identify a set of streams to receive that best matched its
available screens; so, the VC0 plus VC1 plus VC2 should match available screens; so, the VC0 plus VC1 plus VC2 should match
optimally. The spatial ordering would give sufficient information optimally. The spatial ordering would give sufficient information
for the correct Video Capture to be shown on the correct screen. for the correct Video Capture to be shown on the correct screen.
<!--[rfced] The use of "either" in this sentence seems odd. Also, the le ngth of this sentence makes it difficult to follow. If our suggested ed its do not convey your intended meaning, please let us know how we may r ephrase. <!--[rfced] The use of "either" in this sentence seems odd. Also, the le ngth of this sentence makes it difficult to follow. If our suggested ed its do not convey your intended meaning, please let us know how we may r ephrase.
skipping to change at line 2687 skipping to change at line 2702
--> -->
The consumer would either need to divide a single encoding The consumer would either need to divide a single encoding
group's capability by 3 to determine what resolution and frame group's capability by 3 to determine what resolution and frame
rate to configure the provider with or to configure the individual rate to configure the provider with or to configure the individual
Video Captures' Encoding Groups with what makes most sense (taking Video Captures' Encoding Groups with what makes most sense (taking
into account the receive side decode capabilities, overall call into account the receive side decode capabilities, overall call
bandwidth, the resolution of the screens plus any user preferences bandwidth, the resolution of the screens plus any user preferences
such as motion vs. sharpness).</t> such as motion vs. sharpness).</t>
</section> </section>
</section> </section>
<section anchor="section-12.3" numbered="true" toc="default"> <section anchor="s-12.3" numbered="true" toc="default">
<name>Multipoint Conference Utilizing Multiple Content Captures< /name> <name>Multipoint Conference Utilizing Multiple Content Captures< /name>
<t> <t>
The use of MCCs allows the MCU to construct outgoing Advertisements The use of MCCs allows the MCU to construct outgoing Advertisements
describing complex media switching and composition scenarios. The describing complex media switching and composition scenarios. The
following sections provide several examples.</t> following sections provide several examples.</t>
<t> <t>
Note: in the examples the identities of the CLUE elements (e.g., Note: in the examples the identities of the CLUE elements (e.g.,
Captures, Capture Scene) in the incoming Advertisements overlap. Captures, Capture Scene) in the incoming Advertisements overlap.
This is because there is no coordination between the endpoints. This is because there is no coordination between the endpoints.
The MCU is responsible for making these unique in the outgoing The MCU is responsible for making these unique in the outgoing
advertisement.</t> advertisement.</t>
<section anchor="section-12.3.1" numbered="true" toc="default"> <section anchor="s-12.3.1" numbered="true" toc="default">
<name>Single Media Captures and MCC in the Same Advertisement< /name> <name>Single Media Captures and MCC in the Same Advertisement< /name>
<t> <t>
Four endpoints are involved in a Conference where CLUE is used. An Four endpoints are involved in a Conference where CLUE is used. An
MCU acts as a middlebox between the endpoints with a CLUE channel MCU acts as a middlebox between the endpoints with a CLUE channel
between each endpoint and the MCU. The MCU receives the following between each endpoint and the MCU. The MCU receives the following
Advertisements.</t> Advertisements.</t>
<table anchor="ref-advertisement-received-from-endpoint-a" ali gn="center"> <table anchor="ref-advertisement-received-from-endpoint-a" ali gn="center">
<name>Advertisement Received from Endpoint A</name> <name>Advertisement Received from Endpoint A</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left"> Capture Scene #1</th>
<th align="left"> Description=AustralianConfRoom</th> <th align="left"> Description=AustralianConfRoom</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">VC1</td> <td align="left">VC1</td>
<td align="left">Description=Audience</td> <td align="left">Description=Audience<br/>EncodeGroupID=
</tr> 1</td>
<tr>
<td align="left"/>
<td align="left">EncodeGroupID=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">CSV(VC1)</td> <td align="left">CSV(VC1)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<table anchor="ref-advertisement-received-from-endpoint-b" ali gn="center"> <table anchor="ref-advertisement-received-from-endpoint-b" ali gn="center">
<name>Advertisement Received from Endpoint B</name> <name>Advertisement Received from Endpoint B</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left"> Capture Scene #1</th>
<th align="left"> Description=ChinaConfRoom</th> <th align="left"> Description=ChinaConfRoom</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">VC1</td> <td align="left">VC1</td>
<td align="left">Description=Speaker</td> <td align="left">Description=Speaker<br/>EncodeGroupID=1
</tr> </td>
<tr>
<td align="left"/>
<td align="left">EncodeGroupID=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">VC2</td> <td align="left">VC2</td>
<td align="left">Description=Audience</td> <td align="left">Description=Audience<br/>EncodeGroupID=
</tr> 1</td>
<tr>
<td align="left"/>
<td align="left">EncodeGroupID=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">CSV(VC1, VC2)</td> <td align="left">CSV(VC1, VC2)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t keepWithPrevious="true">Note: Endpoint B indicates that it sends two streams.</t> <t keepWithPrevious="true">Note: Endpoint B indicates that it sends two streams.</t>
<table anchor="ref-advertisement-received-from-endpoint-c" ali gn="center"> <table anchor="ref-advertisement-received-from-endpoint-c" ali gn="center">
<name>Advertisement Received from Endpoint C</name> <name>Advertisement Received from Endpoint C</name>
<thead> <thead>
<tr> <tr>
<th align="left"> Capture Scene #1</th> <th align="left"> Capture Scene #1</th>
<th align="left"> Description=USAConfRoom</th> <th align="left"> Description=USAConfRoom</th>
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr> <tr>
<td align="left">VC1</td> <td align="left">VC1</td>
<td align="left">Description=Audience</td> <td align="left">Description=Audience<br/>EncodeGroupID=
</tr> 1</td>
<tr>
<td align="left"/>
<td align="left">EncodeGroupID=1</td>
</tr> </tr>
<tr> <tr>
<td align="left">CSV(VC1)</td> <td align="left">CSV(VC1)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t> <t>
If the MCU wanted to provide a Multiple Content Captures containing If the MCU wanted to provide a Multiple Content Captures cont
a round-robin switched view of the audience from the three endpoints aining
and the speaker, it could construct the following advertisement:</t> a round-robin switched view of the audience from the three en
<figure anchor="ref-advertisement-sent-to-endpoint-f-one-encod dpoints
ing"> and the speaker, it could construct the following advertisement
:</t>
<table anchor="ref-advertisement-sent-to-endpoint-f-one-encodi
ng">
<name>Advertisement Sent to Endpoint F - One Encoding</name> <name>Advertisement Sent to Endpoint F - One Encoding</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <tbody>
+=======================+=================================+ <tr>
| Capture Scene #1 | Description=AustralianConfRoom | <th>Capture Scene #1</th> <th>Description=AustralianCo
+-----------------------|---------------------------------+ nfRoom</th>
| VC1 | Description=Audience | </tr>
| CSV(VC1) | | <tr>
+=======================+=================================+ <td>VC1</td> <td>Description=Audience</
| Capture Scene #2 | Description=ChinaConfRoom | td>
+-----------------------|---------------------------------+ </tr>
| VC2 | Description=Speaker | <tr>
| VC3 | Description=Audience | <td>CSV(VC1)</td> <td/>
| CSV(VC2, VC3) | | </tr>
+=======================+=================================+ </tbody>
| Capture Scene #3 | Description=USAConfRoom |
+-----------------------|---------------------------------+ <tbody>
| VC4 | Description=Audience | <tr>
| CSV(VC4) | | <th>Capture Scene #2</th> <th>Description=ChinaConfRoo
+=======================+=================================+ m</th>
| Capture Scene #4 | | </tr>
+-----------------------|---------------------------------+ <tr>
| MCC1(VC1,VC2,VC3,VC4) | Policy=RoundRobin:1 | <td>VC2</td> <td>Description=Speaker</t
| | MaxCaptures=1 | d>
| | EncodingGroup=1 | </tr>
| CSV(MCC1) | | <tr>
+=======================+=================================+ <td>VC3</td> <td>Description=Audience</
]]></artwork> td>
</figure> </tr>
<tr>
<td>CSV(VC2, VC3)</td> <td/>
</tr>
</tbody>
<tbody>
<tr>
<th>Capture Scene #3</th> <th>Description=USAConfRoo
m</th>
</tr>
<tr>
<td>VC4</td> <td>Description=Audience</t
d>
</tr>
<tr>
<td>CSV(VC4)</td> <td/>
</tr>
</tbody>
<tbody>
<tr><th>Capture Scene #4</th> <th/></tr>
<tr>
<td>MCC1(VC1,VC2,VC3,VC4)</td>
<td>Policy=RoundRobin:1<br/>
MaxCaptures=1<br/>
EncodingGroup=1</td>
</tr>
<tr>
<td>CSV(MCC1)</td> <td/>
</tr>
</tbody>
</table>
<t> <t>
Alternatively, if the MCU wanted to provide the speaker as one media Alternatively, if the MCU wanted to provide the speaker as on e media
stream and the audiences as another, it could assign an encoding stream and the audiences as another, it could assign an encoding
group to VC2 in Capture Scene 2 and provide a CSV in Capture Scene group to VC2 in Capture Scene 2 and provide a CSV in Capture Scene
#4 as per the example below.</t> #4 as per the example below.</t>
<figure anchor="ref-advertisement-sent-to-endpoint-f-two-encod ings"> <table anchor="ref-advertisement-sent-to-endpoint-f-two-encodi ngs">
<name>Advertisement Sent to Endpoint F - Two Encodings</name > <name>Advertisement Sent to Endpoint F - Two Encodings</name >
<artwork name="" type="" align="left" alt=""><![CDATA[ <tbody>
+=======================+=================================+ <tr>
| Capture Scene #1 | Description=AustralianConfRoom | <th align="left"> Capture Scene #1</th>
+-----------------------|---------------------------------+ <th align="left"> Description=AustralianConfRoom</th>
| VC1 | Description=Audience | </tr>
| CSV(VC1) | |
+=======================+=================================+ <tr><td>VC1</td> <td>Description=Audience</td>
| Capture Scene #2 | Description=ChinaConfRoom | </tr>
+-----------------------|---------------------------------+ <tr><td>CSV(VC1)</td> <td/>
| VC2 | Description=Speaker | </tr>
| | EncodingGroup=1 | </tbody>
| VC3 | Description=Audience |
| CSV(VC2, VC3) | | <tbody>
+=======================+=================================+ <tr><th>Capture Scene #2</th> <th>Description=ChinaConfRoom</t
| Capture Scene #3 | Description=USAConfRoom | h>
+-----------------------|---------------------------------+ </tr>
| VC4 | Description=Audience | <tr><td>VC2</td> <td>Description=Speaker
| CSV(VC4) | | <br/>EncodingGroup=1</td>
+=======================+=================================+ </tr>
| Capture Scene #4 | | <tr><td>VC3</td> <td>Description=Audience</td>
+-----------------------|---------------------------------+ </tr>
| MCC1(VC1,VC3,VC4) | Policy=RoundRobin:1 | <tr><td>CSV(VC2, VC3)</td> <td/>
| | MaxCaptures=1 | </tr>
| | EncodingGroup=1 | </tbody>
| | AllowSubset=True |
| MCC2(VC2) | MaxCaptures=1 | <tbody>
| | EncodingGroup=1 | <tr><th>Capture Scene #3</th> <th>Description=USAConfRoom</th>
| CSV2(MCC1,MCC2) | | </tr>
+=======================+=================================+ <tr><td>VC4</td> <td>Description=Audience</td>
]]></artwork> </tr>
</figure> <tr><td>CSV(VC4)</td> <td/>
</tr>
</tbody>
<tbody>
<tr><th>Capture Scene #4</th> <th/>
</tr>
<tr><td>MCC1(VC1,VC3,VC4)</td> <td>Policy=RoundRobin:1
<br/>MaxCaptures=1
<br/>EncodingGroup=1
<br/>AllowSubset=True</td>
</tr>
<tr><td>MCC2(VC2)</td> <td>MaxCaptures=1
<br/>EncodingGroup=1</td>
</tr>
<tr><td>CSV2(MCC1,MCC2)</td> <td/>
</tr>
</tbody>
</table>
<t> <t>
Therefore, a Consumer could choose whether or not to have a separate Therefore, a Consumer could choose whether or not to have a separate
speaker-related stream and could choose which endpoints to see. If speaker-related stream and could choose which endpoints to see. If
it wanted the second stream but not the Australian conference room, it wanted the second stream but not the Australian conference room,
it could indicate the following captures in the Configure message:</t > it could indicate the following captures in the Configure message:</t >
<figure anchor="table_15"> <table anchor="table_15">
<name>MCU Case: Consumer Response</name> <name>MCU Case: Consumer Response</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <tbody>
+-----------------------+---------------------------------+ <tr><td>MCC1(VC3,VC4)</td> <td>Encoding</td></tr>
| MCC1(VC3,VC4) | Encoding | <tr><td>VC2</td> <td>Encoding</td></tr>
| VC2 | Encoding | </tbody>
+-----------------------|---------------------------------+ </table>
]]></artwork>
</figure>
</section> </section>
<section anchor="section-12.3.2" numbered="true" toc="default"> <section anchor="s-12.3.2" numbered="true" toc="default">
<name>Several MCCs in the Same Advertisement</name> <name>Several MCCs in the Same Advertisement</name>
<t> <t>
<!--[rfced] Is the use of "Multiple MCCs" redundant (as MCC is <!--[rfced] Is the use of "Multiple MCCs" redundant (as MCC is
Multiple Content Capture)? Same with "MCC Capture"? Multiple Content Capture)? Same with "MCC Capture"?
Originals: Originals:
The same index value can be used for multiple MCCs. The same index value can be used for multiple MCCs.
... ...
The Synchronisation Identity MCC attribute indicates how the The Synchronisation Identity MCC attribute indicates how the
skipping to change at line 2979 skipping to change at line 3020
<td align="left">CSV(VC1,VC2,VC3)</td> <td align="left">CSV(VC1,VC2,VC3)</td>
<td align="left"/> <td align="left"/>
</tr> </tr>
</tbody> </tbody>
</table> </table>
<t> <t>
The MCU wants to offer Endpoint F three Capture Encodings. Each The MCU wants to offer Endpoint F three Capture Encodings. Each
Capture Encoding would contain all the Captures from either Capture Encoding would contain all the Captures from either
Endpoint D or Endpoint E, depending on the active speaker. Endpoint D or Endpoint E, depending on the active speaker.
The MCU sends the following Advertisement:</t> The MCU sends the following Advertisement:</t>
<figure anchor="ref-advertisement-sent-to-endpoint-f"> <table anchor="ref-advertisement-sent-to-endpoint-f">
<name>Advertisement Sent to Endpoint F</name> <name>Advertisement Sent to Endpoint F</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <tbody>
+=======================+=================================+ <tr>
| Capture Scene #1 | Description=AustralianConfRoom | <th>Capture Scene #1</th><th>Description=AustralianConfR
+-----------------------|---------------------------------+ oom</th>
| VC1 | | </tr>
| VC2 | |
| VC3 | | <tr><td>VC1</td> <td/></tr>
| CSV(VC1,VC2,VC3) | | <tr><td>VC2</td> <td/></tr>
+=======================+=================================+ <tr><td>VC3</td> <td/></tr>
| Capture Scene #2 | Description=ChinaConfRoom | <tr><td>CSV(VC1,VC2,VC3)</td> <td/></tr>
+-----------------------|---------------------------------+ </tbody>
| VC4 | |
| VC5 | | <tbody>
| VC6 | | <tr><th>Capture Scene #2</th> <th>Description=ChinaCon
| CSV(VC4,VC5,VC6) | | fRoom</th></tr>
+=======================+=================================+
| Capture Scene #3 | | <tr><td>VC4</td> <td/></tr>
+-----------------------|---------------------------------+ <tr><td>VC5</td> <td/></tr>
| MCC1(VC1,VC4) | CaptureArea=Left | <tr><td>VC6</td> <td/></tr>
| | MaxCaptures=1 | <tr><td>CSV(VC4,VC5,VC6)</td> <td/></tr>
| | SynchronizationID=1 | </tbody>
| | EncodingGroup=1 | <tbody>
| MCC2(VC2,VC5) | CaptureArea=Center |
| | MaxCaptures=1 | <tr><th>Capture Scene #3</th> <th/></tr>
| | SynchronizationID=1 |
| | EncodingGroup=1 | <tr><td>MCC1(VC1,VC4)</td> <td>CaptureArea=Left
| MCC3(VC3,VC6) | CaptureArea=Right | <br/>MaxCaptures=1
| | MaxCaptures=1 | <br/>SynchronizationID=1
| | SynchronizationID=1 | <br/>EncodingGroup=1
| | EncodingGroup=1 | </td>
| CSV(MCC1,MCC2,MCC3) | | </tr>
+=======================+=================================+ <tr><td>MCC2(VC2,VC5)</td> <td>CaptureArea=Center
]]></artwork> <br/>MaxCaptures=1
</figure> <br/>SynchronizationID=1
<br/>EncodingGroup=1
</td>
</tr>
<tr><td>MCC3(VC3,VC6)</td> <td>CaptureArea=Right
<br/>MaxCaptures=1
<br/>SynchronizationID=1
<br/>EncodingGroup=1
</td>
</tr>
<tr><td>CSV(MCC1,MCC2,MCC3)</td> <td/></tr>
</tbody>
</table>
</section> </section>
<section anchor="section-12.3.3" numbered="true" toc="default"> <section anchor="s-12.3.3" numbered="true" toc="default">
<name>Heterogeneous Conference with Switching and Composition< /name> <name>Heterogeneous Conference with Switching and Composition< /name>
<t> <t>
Consider a conference between endpoints with the following Consider a conference between endpoints with the following
characteristics:</t> characteristics:</t>
<dl newline="false" spacing="normal" indent="3"> <dl newline="false" spacing="normal">
<dt/> <dt>Endpoint A -</dt>
<dd> <dd>4 screens, 3 cameras</dd>
Endpoint A - 4 screens, 3 cameras</dd>
</dl> <dt>Endpoint B -</dt>
<dl newline="false" spacing="normal" indent="3"> <dd>3 screens, 3 cameras</dd>
<dt/>
<dd> <dt>Endpoint C -</dt>
Endpoint B - 3 screens, 3 cameras</dd> <dd>3 screens, 3 cameras</dd>
</dl>
<dl newline="false" spacing="normal" indent="3"> <dt>Endpoint D -</dt>
<dt/> <dd>3 screens, 3 cameras</dd>
<dd>
Endpoint C - 3 screens, 3 cameras</dd> <dt>Endpoint E -</dt>
</dl> <dd>1 screen, 1 camera</dd>
<dl newline="false" spacing="normal" indent="3">
<dt/> <dt>Endpoint F -</dt>
<dd> <dd>2 screens, 1 camera</dd>
Endpoint D - 3 screens, 3 cameras</dd>
</dl> <dt>Endpoint G -</dt>
<dl newline="false" spacing="normal" indent="3"> <dd>1 screen, 1 camera</dd>
<dt/>
<dd>
Endpoint E - 1 screen, 1 camera</dd>
</dl>
<dl newline="false" spacing="normal" indent="3">
<dt/>
<dd>
Endpoint F - 2 screens, 1 camera</dd>
</dl>
<dl newline="false" spacing="normal" indent="3">
<dt/>
<dd>
Endpoint G - 1 screen, 1 camera</dd>
</dl> </dl>
<t> <t>
This example focuses on what the user in one of the three-camera This example focuses on what the user in one of the three-camera
multi-screen endpoints sees. Call this person User A, at Endpoint multi-screen endpoints sees. Call this person User A, at Endpoint
A. There are four large display screens at Endpoint A. Whenever A. There are four large display screens at Endpoint A. Whenever
somebody at another site is speaking, all the video captures from somebody at another site is speaking, all the video captures from
that endpoint are shown on the large screens. If the talker is at that endpoint are shown on the large screens. If the talker is at
a three-camera site, then the video from those three cameras fills th ree of a three-camera site, then the video from those three cameras fills th ree of
the screens. If the person speaking is at a single-camera site, then video the screens. If the person speaking is at a single-camera site, then video
from that camera fills one of the screens, while the other screens from that camera fills one of the screens, while the other screens
skipping to change at line 3200 skipping to change at line 3239
<t> <t>
As Endpoints A to D each advertise that three Captures make up a As Endpoints A to D each advertise that three Captures make up a
Capture Scene, the MCU offers these in a "site switching" mode. Capture Scene, the MCU offers these in a "site switching" mode.
That is, there are three Multiple Content Captures (and That is, there are three Multiple Content Captures (and
Capture Encodings) each switching between Endpoints. The MCU Capture Encodings) each switching between Endpoints. The MCU
switches in the applicable media into the stream based on voice switches in the applicable media into the stream based on voice
activity. Endpoint A will not see a capture from itself.</t> activity. Endpoint A will not see a capture from itself.</t>
<t> <t>
Using the MCC concept, the MCU would send the following Using the MCC concept, the MCU would send the following
Advertisement to Endpoint A:</t> Advertisement to Endpoint A:</t>
<figure anchor="ref-advertisement-sent-to-endpoint-a-source-pa
rt"> <table anchor="ref-advertisement-sent-to-endpoint-a-source-par
t">
<name>Advertisement Sent to Endpoint A - Source Part</name> <name>Advertisement Sent to Endpoint A - Source Part</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <tbody>
+=======================+=================================+ <tr>
| Capture Scene #1 | Description=Endpoint B | <th>Capture Scene #1</th><th>Description=Endpoint B</th>
+-----------------------|---------------------------------+ </tr>
| VC4 | CaptureArea=Left |
| VC5 | CaptureArea=Center | <tr><td>VC4</td> <td>CaptureArea=Left</td></tr>
| VC6 | CaptureArea=Right | <tr><td>VC5</td> <td>CaptureArea=Center</td></tr>
| AC1 | | <tr><td>VC6</td> <td>CaptureArea=Right</td></tr>
| CSV(VC4,VC5,VC6) | | <tr><td>AC1</td> <td/></tr>
| CSV(AC1) | | <tr><td>CSV(VC4,VC5,VC6)</td> <td/></tr>
+=======================+=================================+ <tr><td>CSV(AC1)</td> <td/></tr>
| Capture Scene #2 | Description=Endpoint C | </tbody>
+-----------------------|---------------------------------+ <tbody>
| VC7 | CaptureArea=Left | <tr>
| VC8 | CaptureArea=Center | <th>Capture Scene #2</th><th>Description=Endpoint C</th>
| VC9 | CaptureArea=Right | </tr>
| AC2 | | <tr><td>VC7</td> <td>CaptureArea=Left</td></tr>
| CSV(VC7,VC8,VC9) | | <tr><td>VC8</td> <td>CaptureArea=Center</td></tr>
| CSV(AC2) | | <tr><td>VC9</td> <td>CaptureArea=Right</td></tr>
+=======================+=================================+ <tr><td>AC2</td> <td/></tr>
| Capture Scene #3 | Description=Endpoint D | <tr><td>CSV(VC7,VC8,VC9)</td> <td/></tr>
+-----------------------|---------------------------------+ <tr><td>CSV(AC2)</td> <td/></tr>
| VC10 | CaptureArea=Left | </tbody>
| VC11 | CaptureArea=Center | <tbody>
| VC12 | CaptureArea=Right | <tr>
| AC3 | | <th>Capture Scene #3</th><th>Description=Endpoint D</th>
| CSV(VC10,VC11,VC12) | | </tr>
| CSV(AC3) | |
+=======================+=================================+ <tr><td>VC10</td> <td>CaptureArea=Left</td></tr>
| Capture Scene #4 | Description=Endpoint E | <tr><td>VC11</td> <td>CaptureArea=Center</td></tr>
+-----------------------|---------------------------------+ <tr><td>VC12</td> <td>CaptureArea=Right</td></tr>
| VC13 | | <tr><td>AC3</td> <td/></tr>
| AC4 | | <tr><td>CSV(VC10,VC11,VC12)</td> <td/></tr>
| CSV(VC13) | | <tr><td>CSV(AC3)</td> <td/></tr>
| CSV(AC4) | | </tbody>
+=======================+=================================+ <tbody>
| Capture Scene #5 | Description=Endpoint F | <tr>
+-----------------------|---------------------------------+ <th>Capture Scene #4</th><th>Description=Endpoint E</th>
| VC14 | | </tr>
| AC5 | |
| CSV(VC14) | | <tr><td>VC13</td> <td/></tr>
| CSV(AC5) | | <tr><td>AC4</td> <td/></tr>
+=======================+=================================+ <tr><td>CSV(VC13)</td> <td/></tr>
| Capture Scene #6 | Description=Endpoint G | <tr><td>CSV(AC4)</td> <td/></tr>
+-----------------------|---------------------------------+ </tbody>
| VC15 | | <tbody>
| AC6 | | <tr>
| CSV(VC15) | | <th>Capture Scene #5</th><th>Description=Endpoint F</th>
| CSV(AC6) | | </tr>
+=======================+=================================+
]]></artwork> <tr><td>VC14</td> <td/></tr>
</figure> <tr><td>AC5</td> <td/></tr>
<tr><td>CSV(VC14)</td> <td/></tr>
<tr><td>CSV(AC5)</td> <td/></tr>
</tbody>
<tbody>
<tr>
<th>Capture Scene #6</th><th>Description=Endpoint G</th>
</tr>
<tr><td>VC15</td> <td/></tr>
<tr><td>AC6</td> <td/></tr>
<tr><td>CSV(VC15)</td> <td/></tr>
<tr><td>CSV(AC6)</td> <td/></tr>
</tbody>
</table>
<t> <t>
The above part of the Advertisement presents information about the The above part of the Advertisement presents information about the
sources to the MCC. The information is effectively the same as the sources to the MCC. The information is effectively the same as the
received Advertisements, except that there are no Capture Encodings received Advertisements, except that there are no Capture Encodings
associated with them and the identities have been renumbered.</t> associated with them and the identities have been renumbered.</t>
<t> <t>
In addition to the source Capture information, the MCU advertises In addition to the source Capture information, the MCU advertises
site switching of Endpoints B to G in three streams.</t> site switching of Endpoints B to G in three streams.</t>
<figure anchor="table_22"> <table anchor="table_22">
<name>Advertisement Sent to Endpoint A - Switching Part</nam e> <name>Advertisement Sent to Endpoint A - Switching Part</nam e>
<artwork name="" type="" align="left" alt=""><![CDATA[ <thead>
+=======================+=================================+ <tr>
| Capture Scene #7 | Description=Output3streammix | <th>Capture Scene #7</th><th>Description=Output3streammix
+-----------------------|---------------------------------+ </th>
| MCC1(VC4,VC7,VC10, | CaptureArea=Left | </tr>
| VC13) | MaxCaptures=1 | </thead>
| | SynchronizationID=1 | <tbody>
| | Policy=SoundLevel:0 |
| | EncodingGroup=1 | <tr>
| | | <td>MCC1(VC4,VC7,VC10,&zwsp;VC13)</td> <td>CaptureArea=Left
| MCC2(VC5,VC8,VC11, | CaptureArea=Center | <br/>MaxCaptures=1
| VC14) | MaxCaptures=1 | <br/>SynchronizationID=1
| | SynchronizationID=1 | <br/>Policy=SoundLevel:0
| | Policy=SoundLevel:0 | <br/>EncodingGroup=1</td>
| | EncodingGroup=1 | </tr>
| | |
| MCC3(VC6,VC9,VC12, | CaptureArea=Right | <tr>
| VC15) | MaxCaptures=1 | <td>MCC2(VC5,VC8,VC11,&zwsp;VC14)</td> <td>CaptureArea=Center
| | SynchronizationID=1 | <br/>MaxCaptures=1
| | Policy=SoundLevel:0 | <br/>SynchronizationID=1
| | EncodingGroup=1 | <br/>Policy=SoundLevel:0
| | | <br/>EncodingGroup=1</td>
| MCC4() (for audio) | CaptureArea=whole scene | </tr>
| | MaxCaptures=1 |
| | Policy=SoundLevel:0 | <tr>
| | EncodingGroup=2 | <td>MCC3(VC6,VC9,VC12,&zwsp;VC15)</td> <td>CaptureArea=Right
| | | <br/>MaxCaptures=1
| MCC5() (for audio) | CaptureArea=whole scene | <br/>SynchronizationID=1
| | MaxCaptures=1 | <br/>Policy=SoundLevel:0
| | Policy=SoundLevel:1 | <br/>EncodingGroup=1</td>
| | EncodingGroup=2 | </tr>
| | | <tr>
| MCC6() (for audio) | CaptureArea=whole scene | <td>MCC4() (for audio)</td> <td>CaptureArea=whole scene
| | MaxCaptures=1 | <br/>MaxCaptures=1
| | Policy=SoundLevel:2 | <br/>Policy=SoundLevel:0
| | EncodingGroup=2 | <br/>EncodingGroup=2</td>
| | | </tr>
| MCC7() (for audio) | CaptureArea=whole scene | <tr>
| | MaxCaptures=1 | <td>MCC5() (for audio)</td> <td>CaptureArea=whole scene
| | Policy=SoundLevel:3 | <br/>MaxCaptures=1
| | EncodingGroup=2 | <br/>Policy=SoundLevel:1
| | | <br/>EncodingGroup=2</td>
| CSV(MCC1,MCC2,MCC3) | | </tr>
| CSV(MCC4,MCC5,MCC6, | | <tr>
| MCC7) | | <td>MCC6() (for audio)</td> <td>CaptureArea=whole scene
+=======================+=================================+ <br/>MaxCaptures=1
]]></artwork> <br/>Policy=SoundLevel:2
</figure> <br/>EncodingGroup=2</td>
</tr>
<tr>
<td>MCC7() (for audio)</td> <td>CaptureArea=whole scene
<br/>MaxCaptures=1
<br/>Policy=SoundLevel:3
<br/>EncodingGroup=2</td>
</tr>
<tr>
<td>CSV(MCC1,MCC2,MCC3)</td> <td/></tr>
<tr>
<td>CSV(MCC4,MCC5,MCC6,&zwsp;MCC7)</td> <td/></tr>
</tbody></table>
<t> <t>
The above part describes the three main switched streams that relate to The above part describes the three main switched streams that relate to
site switching. MaxCaptures=1 indicates that only one Capture from site switching. MaxCaptures=1 indicates that only one Capture from
the MCC is sent at a particular time. SynchronizationID=1 indicates the MCC is sent at a particular time. SynchronizationID=1 indicates
that the source sending is synchronized. The provider can choose to that the source sending is synchronized. The provider can choose to
group together VC13, VC14, and VC15 for the purpose of switching group together VC13, VC14, and VC15 for the purpose of switching
according to the SynchronizationID. Therefore, when the provider according to the SynchronizationID. Therefore, when the provider
switches one of them into an MCC, it can also switch the others switches one of them into an MCC, it can also switch the others
even though they are not part of the same Capture Scene.</t> even though they are not part of the same Capture Scene.</t>
<t> <t>
All the audio for the conference is included in Scene #7. All the audio for the conference is included in Scene #7.
There isn't necessarily a one-to-one relation between any audio There isn't necessarily a one-to-one relation between any audio
capture and video capture in this scene. Typically, a change in capture and video capture in this scene. Typically, a change in
the loudest talker will cause the MCU to switch the audio streams mor e the loudest talker will cause the MCU to switch the audio streams mor e
quickly than switching video streams.</t> quickly than switching video streams.</t>
<t> <t>
The MCU can also supply nine media streams showing the active and The MCU can also supply nine media streams showing the active and
previous eight speakers. It includes the following in the previous eight speakers. It includes the following in the
Advertisement:</t> Advertisement:</t>
<figure anchor="table_23">
<table anchor="table_23">
<name>Advertisement Sent to Endpoint A - 9 Switched Part</na me> <name>Advertisement Sent to Endpoint A - 9 Switched Part</na me>
<artwork name="" type="" align="left" alt=""><![CDATA[ <thead>
+=======================+=================================+ <tr>
| Capture Scene #8 | Description=Output9stream | <th>Capture Scene #8</th><th>Description=Output9stream</t
+-----------------------|---------------------------------+ h>
| MCC8(VC4,VC5,VC6,VC7, | MaxCaptures=1 | </tr>
| VC8,VC9,VC10,VC11, | Policy=SoundLevel:0 | </thead>
| VC12,VC13,VC14,VC15)| EncodingGroup=1 | <tbody>
| | | <tr>
| MCC9(VC4,VC5,VC6,VC7, | MaxCaptures=1 | <td align="right">MCC8(VC4,VC5,VC6,VC7,
| VC8,VC9,VC10,VC11, | Policy=SoundLevel:1 | <br/>VC8,VC9,VC10,VC11,
| VC12,VC13,VC14,VC15)| EncodingGroup=1 | <br/>VC12,VC13,VC14,VC15)</td>
| | |
to to | <td>MaxCaptures=1
| | | <br/>Policy=SoundLevel:0
| MCC16(VC4,VC5,VC6,VC7,| MaxCaptures=1 | <br/>EncodingGroup=1</td>
| VC8,VC9,VC10,VC11, | Policy=SoundLevel:8 | </tr><tr>
| VC12,VC13,VC14,VC15)| EncodingGroup=1 |
| | | <td align="right">MCC9(VC4,VC5,VC6,VC7,
| CSV(MCC8,MCC9,MCC10, | | <br/>VC8,VC9,VC10,VC11,
| MCC11,MCC12,MCC13,| | <br/>VC12,VC13,VC14,VC15)
| MCC14,MCC15,MCC16)| | </td>
+=======================+=================================+
]]></artwork> <td>MaxCaptures=1
</figure> <br/>Policy=SoundLevel:1
<br/>EncodingGroup=1</td>
</tr><tr>
<th align="center">to</th><th align="center">to</th>
</tr><tr>
<td align="right">MCC16(VC4,VC5,VC6,VC7,
<br/>VC8,VC9,VC10,VC11,
<br/>VC12,VC13,VC14,VC15)</td>
<td>MaxCaptures=1
<br/>Policy=SoundLevel:8
<br/>EncodingGroup=1</td>
</tr><tr>
<td align="right">CSV(MCC8,MCC9,MCC10,
<br/>MCC11,MCC12,MCC13,
<br/>MCC14,MCC15,MCC16)</td>
<td/>
</tr>
</tbody>
</table>
<t> <t>
The above part indicates that there are nine capture encodings. Each The above part indicates that there are nine capture encodings. Each
of the Capture Encodings may contain any captures from any source of the Capture Encodings may contain any captures from any source
site with a maximum of one Capture at a time. Which Capture is site with a maximum of one Capture at a time. Which Capture is
present is determined by the policy. The MCCs in this scene do not present is determined by the policy. The MCCs in this scene do not
have any spatial attributes.</t> have any spatial attributes.</t>
<t> <t>
Note: The Provider alternatively could provide each of the MCCs Note: The Provider alternatively could provide each of the MCCs
above in its own Capture Scene.</t> above in its own Capture Scene.</t>
<t> <t>
If the MCU wanted to provide a composed Capture Encoding containing If the MCU wanted to provide a composed Capture Encoding containing
all of the nine captures, it could advertise in addition:</t> all of the nine captures, it could advertise in addition:</t>
<figure anchor="ref-advertisement-sent-to-endpoint-a-9-compose d-part"> <table anchor="ref-advertisement-sent-to-endpoint-a-9-composed -part">
<name>Advertisement Sent to Endpoint A - 9 Composed Part</na me> <name>Advertisement Sent to Endpoint A - 9 Composed Part</na me>
<artwork name="" type="" align="left" alt=""><![CDATA[ <thead>
+=======================+=================================+ <tr>
| Capture Scene #9 | Description=NineTiles | <th>Capture Scene #9</th><th>Description=NineTiles</th>
+-----------------------|---------------------------------+ </tr>
| MCC13(MCC8,MCC9,MCC10,| MaxCaptures=9 | </thead>
| MCC11,MCC12,MCC13,| EncodingGroup=1 | <tbody>
| MCC14,MCC15,MCC16)| | <tr>
| | | <td align="right">MCC13(MCC8,MCC9,MCC10,<br/>
| CSV(MCC13) | | MCC11,MCC12,MCC13,<br/>
+=======================+=================================+ MCC14,MCC15,MCC16)</td>
]]></artwork>
</figure> <td>MaxCaptures=9<br/>
EncodingGroup=1</td>
</tr>
<tr>
<td>CSV(MCC13)</td><td/>
</tr>
</tbody>
</table>
<t> <t>
As MaxCaptures is 9, it indicates that the capture encoding contains As MaxCaptures is 9, it indicates that the capture encoding contains
information from nine sources at a time.</t> information from nine sources at a time.</t>
<t> <t>
The Advertisement to Endpoint B is identical to the above, other The Advertisement to Endpoint B is identical to the above, other
than the fact that captures from Endpoint A would be added and the ca ptures than the fact that captures from Endpoint A would be added and the ca ptures
from Endpoint B would be removed. Whether the Captures are rendered from Endpoint B would be removed. Whether the Captures are rendered
on a four-screen display or a three-screen display is up to the on a four-screen display or a three-screen display is up to the
Consumer to determine. The Consumer wants to place video captures Consumer to determine. The Consumer wants to place video captures
from the same original source endpoint together, in the correct from the same original source endpoint together, in the correct
spatial order, but the MCCs do not have spatial attributes. So, the spatial order, but the MCCs do not have spatial attributes. So, the
Consumer needs to associate incoming media packets with the Consumer needs to associate incoming media packets with the
original individual captures in the advertisement (such as VC4, original individual captures in the advertisement (such as VC4,
VC5, and VC6) in order to know the spatial information it needs for VC5, and VC6) in order to know the spatial information it needs for
correct placement on the screens. The Provider can use the RTP Contr ol Protocol (RTCP) correct placement on the screens. The Provider can use the RTP Contr ol Protocol (RTCP)
CaptureId source description (SDES) item and associated RTP header ex tension, as CaptureId source description (SDES) item and associated RTP header ex tension, as
described in <xref target="RFCYYY4" format="default"/>, to convey thi s described in <xref target="RFCYYY4" format="default"/>, to convey thi s
information to the Consumer.</t> information to the Consumer.</t>
</section> </section>
<section anchor="section-12.3.4" numbered="true" toc="default"> <section anchor="s-12.3.4" numbered="true" toc="default">
<name>Heterogeneous Conference with Voice-Activated Switching< /name> <name>Heterogeneous Conference with Voice-Activated Switching< /name>
<t> <t>
This example illustrates how multipoint "voice-activated switching" This example illustrates how multipoint "voice-activated switching"
behavior can be realized, with an endpoint making its own decision behavior can be realized, with an endpoint making its own decision
about which of its outgoing video streams is considered the "active t alker" from that endpoint. Then, an MCU can decide which is the about which of its outgoing video streams is considered the "active t alker" from that endpoint. Then, an MCU can decide which is the
active talker among the whole conference.</t> active talker among the whole conference.</t>
<t> <t>
Consider a conference between endpoints with the following Consider a conference between endpoints with the following
characteristics:</t> characteristics:</t>
<dl newline="false" spacing="normal" indent="3"> <dl newline="false" spacing="normal">
<dt/> <dt>Endpoint A -</dt>
<dd> <dd>3 screens, 3 cameras</dd>
Endpoint A - 3 screens, 3 cameras</dd>
</dl> <dt>Endpoint B -</dt>
<dl newline="false" spacing="normal" indent="3"> <dd>3 screens, 3 cameras</dd>
<dt/>
<dd> <dt>Endpoint C -</dt>
Endpoint B - 3 screens, 3 cameras</dd> <dd>1 screen, 1 camera</dd>
</dl>
<dl newline="false" spacing="normal" indent="3">
<dt/>
<dd>
Endpoint C - 1 screen, 1 camera</dd>
</dl> </dl>
<t> <t>
This example focuses on what the user at Endpoint C sees. The This example focuses on what the user at Endpoint C sees. The
user would like to see the video capture of the current talker, user would like to see the video capture of the current talker,
without composing it with any other video capture. In this without composing it with any other video capture. In this
example, Endpoint C is capable of receiving only a single video example, Endpoint C is capable of receiving only a single video
stream. The following tables describe advertisements from Endpoints A and B stream. The following tables describe advertisements from Endpoints A and B
to the MCU, and from the MCU to Endpoint C, that can be used to accom plish to the MCU, and from the MCU to Endpoint C, that can be used to accom plish
this.</t> this.</t>
<figure anchor="ref-advertisement-received-at-the-mcu-from-end points-a-and-b"> <table anchor="ref-advertisement-received-at-the-mcu-from-endp oints-a-and-b">
<name>Advertisement Received at the MCU from Endpoints A and B</name> <name>Advertisement Received at the MCU from Endpoints A and B</name>
<artwork name="" type="" align="left" alt=""><![CDATA[ <thead>
+-----------------------+---------------------------------+ <tr>
| Capture Scene #1 | Description=Endpoint x | <th>Capture Scene #1</th><th>Description=Endpoint x</th>
+-----------------------|---------------------------------+ </tr>
| VC1 | CaptureArea=Left | </thead>
| | EncodingGroup=1 | <tbody>
| VC2 | CaptureArea=Center | <tr>
| | EncodingGroup=1 | <td>VC1</td> <td>CaptureArea=Left
| VC3 | CaptureArea=Right | <br/>EncodingGroup=1</td>
| | EncodingGroup=1 | </tr>
| MCC1(VC1,VC2,VC3) | MaxCaptures=1 | <tr>
| | CaptureArea=whole scene | <td>VC2</td> <td>CaptureArea=Center
| | Policy=SoundLevel:0 | <br/>EncodingGroup=1</td>
| | EncodingGroup=1 | </tr>
| AC1 | CaptureArea=whole scene | <tr>
| | EncodingGroup=2 | <td>VC3</td> <td>CaptureArea=Right
| CSV1(VC1, VC2, VC3) | | <br/>EncodingGroup=1</td>
| CSV2(MCC1) | | </tr>
| CSV3(AC1) | | <tr>
+---------------------------------------------------------+ <td>MCC1(VC1,VC2,VC3)</td> <td>MaxCaptures=1
]]></artwork> <br/>CaptureArea=whole scene
</figure> <br/>Policy=SoundLevel:0
<br/>EncodingGroup=1</td>
</tr>
<tr>
<td>AC1</td> <td>CaptureArea=whole scene
<br/>EncodingGroup=2</td>
</tr>
<tr>
<td>CSV1(VC1, VC2, VC3)</td><td/>
</tr>
<tr>
<td>CSV2(MCC1)</td><td/>
</tr>
<tr>
<td>CSV3(AC1)</td><td/>
</tr></tbody>
</table>
<t> <t>
Endpoints A and B are advertising each individual video capture, Endpoints A and B are advertising each individual video capture,
and also a switched capture MCC1 that switches between the other and also a switched capture MCC1 that switches between the other
three based on who is the active talker. These endpoints do not three based on who is the active talker. These endpoints do not
advertise distinct audio captures associated with each individual advertise distinct audio captures associated with each individual
video capture, so it would be impossible for the MCU (as a media video capture, so it would be impossible for the MCU (as a media
consumer) to make its own determination of which video capture is consumer) to make its own determination of which video capture is
the active talker based just on information in the audio streams.</t> the active talker based just on information in the audio streams.</t>
<figure anchor="ref-advertisement-sent-from-the-mcu-to-c"> <table anchor="ref-advertisement-sent-from-the-mcu-to-c">
<name>Advertisement Sent from the MCU to Endpoint C</name> <name>Advertisement Sent from the MCU to Endpoint&nbsp;C</na
<artwork name="" type="" align="left" alt=""><![CDATA[ me>
+-----------------------+---------------------------------+
| Capture Scene #1 | Description=conference | <thead>
+-----------------------|---------------------------------+ <tr><th>Capture Scene #1</th><th>Description=conference</th
| MCC1() | CaptureArea=Left | >
| | MaxCaptures=1 | </tr>
| | SynchronizationID=1 | </thead>
| | Policy=SoundLevel:0 | <tbody>
| | EncodingGroup=1 | <tr>
| | | <td>MCC1()</td>
| MCC2() | CaptureArea=Center | <td>CaptureArea=Left
| | MaxCaptures=1 | <br/>MaxCaptures=1
| | SynchronizationID=1 | <br/>SynchronizationID=1
| | Policy=SoundLevel:0 | <br/>Policy=SoundLevel:0
| | EncodingGroup=1 | <br/>EncodingGroup=1
| | | </td>
| MCC3() | CaptureArea=Right | </tr>
| | MaxCaptures=1 | <tr>
| | SynchronizationID=1 | <td>MCC2()</td><td>CaptureArea=Center
| | Policy=SoundLevel:0 | <br/>MaxCaptures=1
| | EncodingGroup=1 | <br/>SynchronizationID=1
| | | <br/>Policy=SoundLevel:0
| MCC4() | CaptureArea=whole scene | <br/>EncodingGroup=1
| | MaxCaptures=1 | </td>
| | Policy=SoundLevel:0 | </tr>
| | EncodingGroup=1 | <tr>
| | | <td>MCC3()</td><td>CaptureArea=Right
| MCC5() (for audio) | CaptureArea=whole scene | <br/>MaxCaptures=1
| | MaxCaptures=1 | <br/>SynchronizationID=1
| | Policy=SoundLevel:0 | <br/>Policy=SoundLevel:0
| | EncodingGroup=2 | <br/>EncodingGroup=1
| | | </td>
| MCC6() (for audio) | CaptureArea=whole scene | </tr>
| | MaxCaptures=1 | <tr>
| | Policy=SoundLevel:1 | <td>MCC4()</td><td>CaptureArea=whole scene
| | EncodingGroup=2 | <br/>MaxCaptures=1
| CSV1(MCC1,MCC2,MCC3 | | <br/>Policy=SoundLevel:0
| CSV2(MCC4) | | <br/>EncodingGroup=1
| CSV3(MCC5,MCC6) | | </td>
+---------------------------------------------------------+ </tr>
]]></artwork> <tr>
</figure> <td>MCC5() (for audio)</td><td>CaptureArea=whole scene
<br/>MaxCaptures=1
<br/>Policy=SoundLevel:0
<br/>EncodingGroup=2
</td>
</tr>
<tr>
<td>MCC6() (for audio)</td><td>CaptureArea=whole scene
<br/>MaxCaptures=1
<br/>Policy=SoundLevel:1
<br/>EncodingGroup=2
</td>
</tr>
<tr><td>CSV1(MCC1,MCC2,MCC3)</td><td/></tr>
<tr><td>CSV2(MCC4)</td><td/></tr>
<tr><td>CSV3(MCC5,MCC6)</td><td/></tr>
</tbody>
</table>
<!-- [rfced] FYI, in Table 26, we add a closing parenthesis here.
Please let us know if this is not correct.
Original: CSV1(MCC1,MCC2,MCC3
Current: CSV1(MCC1,MCC2,MCC3)
-->
<t> <t>
The MCU advertises one scene, with four video MCCs. Three of them The MCU advertises one scene, with four video MCCs. Three of them
in CSV1 give a left, center, and right view of the conference, with in CSV1 give a left, center, and right view of the conference, with
site switching. MCC4 provides a single video capture site switching. MCC4 provides a single video capture
representing a view of the whole conference. The MCU intends for representing a view of the whole conference. The MCU intends for
MCC4 to be switched between all the other original source MCC4 to be switched between all the other original source
captures. In this example, advertisement of the MCU is not giving al l captures. In this example, advertisement of the MCU is not giving al l
the information about all the other endpoints' scenes and which of the information about all the other endpoints' scenes and which of
those captures are included in the MCCs. The MCU could include all those captures are included in the MCCs. The MCU could include all
that if it wants to give the consumers more that if it wants to give the consumers more
skipping to change at line 3541 skipping to change at line 3673
the MCU to get the information it needs to construct MCC4, it has the MCU to get the information it needs to construct MCC4, it has
to send configure messages to Endpoints A and B asking to receive MCC 1 from to send configure messages to Endpoints A and B asking to receive MCC 1 from
each of them, along with their AC1 audio. Now the MCU can use each of them, along with their AC1 audio. Now the MCU can use
audio energy information from the two incoming audio streams from audio energy information from the two incoming audio streams from
Endpoints A and B to determine which of those alternatives is the cur rent Endpoints A and B to determine which of those alternatives is the cur rent
talker. Based on that, the MCU uses either MCC1 from A or MCC1 talker. Based on that, the MCU uses either MCC1 from A or MCC1
from B as the source of MCC4 to send to Endpoint C.</t> from B as the source of MCC4 to send to Endpoint C.</t>
</section> </section>
</section> </section>
</section> </section>
<section anchor="section-14" numbered="true" toc="default"> <section anchor="s-14" numbered="true" toc="default">
<name>IANA Considerations</name> <name>IANA Considerations</name>
<t> <t>
This document does not require any IANA actions. This document does not require any IANA actions.
</t> </t>
</section> </section>
<section anchor="section-15" numbered="true" toc="default"> <section anchor="s-15" numbered="true" toc="default">
<name>Security Considerations</name> <name>Security Considerations</name>
<t> <t>
There are several potential attacks related to telepresence, There are several potential attacks related to telepresence,
specifically the protocols used by CLUE. This is the case due to specifically the protocols used by CLUE. This is the case due to
conferencing sessions, the natural involvement of multiple conferencing sessions, the natural involvement of multiple
endpoints, and the many, often user-invoked, capabilities provided endpoints, and the many, often user-invoked, capabilities provided
by the systems.</t> by the systems.</t>
<t> <t>
An MCU involved in a CLUE session can experience many of the same An MCU involved in a CLUE session can experience many of the same
attacks as a conferencing system such as the one enabled by attacks as a conferencing system such as the one enabled by
skipping to change at line 3583 skipping to change at line 3715
implementing the protocols necessary to support CLUE, follow the implementing the protocols necessary to support CLUE, follow the
security recommendations specified in the conference control security recommendations specified in the conference control
protocol documents. protocol documents.
--> -->
In the case of CLUE, SIP is the conferencing In the case of CLUE, SIP is the conferencing
protocol, thus the security considerations in <xref target="RFC4579" format="default"/> MUST be protocol, thus the security considerations in <xref target="RFC4579" format="default"/> MUST be
followed. Other security issues related to MCUs are discussed in followed. Other security issues related to MCUs are discussed in
the XCON framework <xref target="RFC5239" format="default"/>. The use of xCard with potentially the XCON framework <xref target="RFC5239" format="default"/>. The use of xCard with potentially
sensitive information provides another reason to implement sensitive information provides another reason to implement
recommendations of Section 11 in <xref target="RFC5239" format="defau lt"/>.</t> recommendations in <xref section="11" sectionFormat="of" target="RFC5 239" format="default"/>.</t>
<t> <t>
One primary security concern, surrounding the CLUE framework One primary security concern, surrounding the CLUE framework
introduced in this document, involves securing the actual introduced in this document, involves securing the actual
protocols and the associated authorization mechanisms. These protocols and the associated authorization mechanisms. These
concerns apply to endpoint-to-endpoint sessions as well as concerns apply to endpoint-to-endpoint sessions as well as
sessions involving multiple endpoints and MCUs. <xref target="ref-bas ic-information-flow" format="default"/> in sessions involving multiple endpoints and MCUs. <xref target="ref-bas ic-information-flow" format="default"/> in
<xref target="section-5" format="default"/> provides a basic flow of information exchange for CLUE <xref target="s-5" format="default"/> provides a basic flow of inform ation exchange for CLUE
and the protocols involved.</t> and the protocols involved.</t>
<t> <t>
As described in <xref target="section-5" format="default"/>, CLUE use s SIP/SDP to As described in <xref target="s-5" format="default"/>, CLUE uses SIP/ SDP to
establish the session prior to exchanging any CLUE-specific establish the session prior to exchanging any CLUE-specific
information. Thus, the security mechanisms recommended for SIP information. Thus, the security mechanisms recommended for SIP
<xref target="RFC3261" format="default"/>, including user authenticat ion and <xref target="RFC3261" format="default"/>, including user authenticat ion and
authorization, MUST be supported. In addition, the media MUST be authorization, MUST be supported. In addition, the media MUST be
secured. Datagram Transport Layer Security / Secure Real-time secured. Datagram Transport Layer Security / Secure Real-time
Transport Protocol MUST be supported and SHOULD be used unless the Transport Protocol MUST be supported and SHOULD be used unless the
media, which is based on RTP, is secured by other means (see <xref ta rget="RFC7201" format="default"/> <xref target="RFC7202" format="default "/>). Media security is media, which is based on RTP, is secured by other means (see <xref ta rget="RFC7201" format="default"/> <xref target="RFC7202" format="default "/>). Media security is
also discussed in <xref target="RFCYYY3" format="default"/> and <xref target="RFCYYY4" format="default"/>. Note that SIP call setup is done b efore any also discussed in <xref target="RFCYYY3" format="default"/> and <xref target="RFCYYY4" format="default"/>. Note that SIP call setup is done b efore any
CLUE-specific information is available, so the authentication and CLUE-specific information is available, so the authentication and
authorization are based on the SIP mechanisms. The entity that will authorization are based on the SIP mechanisms. The entity that will
be authenticated may use the Endpoint identity or the endpoint user be authenticated may use the Endpoint identity or the endpoint user
identity; this is an application issue and not a CLUE-specific identity; this is an application issue and not a CLUE-specific
issue.</t> issue.</t>
<t> <t>
A separate data channel is established to transport the CLUE A separate data channel is established to transport the CLUE
protocol messages. The contents of the CLUE protocol messages are protocol messages. The contents of the CLUE protocol messages are
based on information introduced in this document. The CLUE data based on information introduced in this document. The CLUE data
model <xref target="RFCYYY1" format="default"/> defines, through an X ML model <xref target="RFCYYY1" format="default"/> defines, through an X ML
schema, the syntax to be used. One type of information that could schema, the syntax to be used. One type of information that could
possibly introduce privacy concerns is the xCard information, as possibly introduce privacy concerns is the xCard information, as
described in <xref target="section-7.1.1.10" format="default"/>. The decision about which xCard described in <xref target="s-7.1.1.10" format="default"/>. The decisi on about which xCard
information to send in the CLUE channel is an application policy information to send in the CLUE channel is an application policy
for point-to-point and multipoint calls based on the authenticated for point-to-point and multipoint calls based on the authenticated
identity that can be the endpoint identity or the user of the identity that can be the endpoint identity or the user of the
endpoint. For example, the telepresence multipoint application can endpoint. For example, the telepresence multipoint application can
authenticate a user before starting a CLUE exchange with the authenticate a user before starting a CLUE exchange with the
telepresence system and have a policy per user.</t> telepresence system and have a policy per user.</t>
<t> <t>
In addition, the (text) description field in the Media Capture In addition, the (text) description field in the Media Capture
attribute (<xref target="section-7.1.1.6" format="default"/>) could p ossibly reveal sensitive attribute (<xref target="s-7.1.1.6" format="default"/>) could possibl y reveal sensitive
information or specific identities. The same would be true for the information or specific identities. The same would be true for the
descriptions in the Capture Scene (<xref target="section-7.3.1" forma descriptions in the Capture Scene (<xref target="s-7.3.1" format="def
t="default"/>) and Capture ault"/>) and Capture
Scene View (<xref target="section-7.3.2" format="default"/>) attribut Scene View (<xref target="s-7.3.2" format="default"/>) attributes. An
es. An implementation SHOULD give users implementation SHOULD give users
control over what sensitive information is sent in an control over what sensitive information is sent in an
Advertisement. One other important consideration for the Advertisement. One other important consideration for the
information in the xCard as well as the description field in the information in the xCard as well as the description field in the
Media Capture and Capture Scene View attributes is that while the Media Capture and Capture Scene View attributes is that while the
endpoints involved in the session have been authenticated, there endpoints involved in the session have been authenticated, there
are no assurance that the information in the xCard or description are no assurance that the information in the xCard or description
fields is authentic. Thus, this information MUST NOT be used to fields is authentic. Thus, this information MUST NOT be used to
make any authorization decisions.</t> make any authorization decisions.</t>
<t> <t>
While other information in the CLUE protocol messages does not While other information in the CLUE protocol messages does not
skipping to change at line 3661 skipping to change at line 3793
However, the policies and security associated with these actions However, the policies and security associated with these actions
are outside the scope of this document and the overall CLUE are outside the scope of this document and the overall CLUE
solution.</t> solution.</t>
</section> </section>
</middle> </middle>
<back> <back>
<references> <references>
<name>References</name> <name>References</name>
<references> <references>
<name>Normative References</name> <name>Normative References</name>
<!--[rfced] PQ: Please review companion document references as m
any of these were not in queue when this document was edited (i.e., chec
k for title changes etc.).
<!-- &I-D.ietf-clue-datachannel; Will be a companion doc - but as of 1 1/16/17 Waiting for AD Go-Ahead;--> <!-- &I-D.ietf-clue-datachannel; Will be a companion doc - but as of 1 1/16/17 Waiting for AD Go-Ahead;-->
<reference anchor="RFCYYYY" target="http://www.rfc-editor.org/info/rfcYY YY"> <reference anchor="RFCYYYY" target="http://www.rfc-editor.org/info/rfcYY YY">
<front> <front>
<title>CLUE Protocol Data Channel</title> <title>CLUE Protocol Data Channel</title>
<seriesInfo name="DOI" value="10.17487/RFCYYYY"/> <seriesInfo name="DOI" value="10.17487/RFCYYYY"/>
<seriesInfo name="RFC" value="YYYY"/> <seriesInfo name="RFC" value="YYYY"/>
<author initials="C" surname="Holmberg" fullname="Christer H olmberg"> <author initials="C" surname="Holmberg" fullname="Christer H olmberg">
<organization/> <organization/>
skipping to change at line 3744 skipping to change at line 3873
</author> </author>
<author initials="R" surname="Hansen" fullname="Robert Hanse n"> <author initials="R" surname="Hansen" fullname="Robert Hanse n">
<organization/> <organization/>
</author> </author>
<date month="August" day="20" year="2017"/> <date month="August" day="20" year="2017"/>
<abstract> <abstract>
<t>This document specifies how CLUE-specific signaling suc h as the CLUE protocol and the CLUE data channel are used in conjunction with each other and with existing signaling mechanisms such as SIP and SDP to produce a telepresence call.</t> <t>This document specifies how CLUE-specific signaling suc h as the CLUE protocol and the CLUE data channel are used in conjunction with each other and with existing signaling mechanisms such as SIP and SDP to produce a telepresence call.</t>
</abstract> </abstract>
</front> </front>
</reference> </reference>
<reference anchor="RFC2119" target="https://www.rfc-editor.org/i
nfo/rfc2119" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/ <xi:include
reference.RFC.2119.xml"> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<front> nce.RFC.2119.xml"/>
<title>Key words for use in RFCs to Indicate Requirement Lev <xi:include
els</title> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<seriesInfo name="DOI" value="10.17487/RFC2119"/> nce.RFC.3261.xml"/>
<seriesInfo name="RFC" value="2119"/> <xi:include
<seriesInfo name="BCP" value="14"/> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<author initials="S." surname="Bradner" fullname="S. Bradner nce.RFC.3264.xml"/>
"> <xi:include
<organization/> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
</author> nce.RFC.3550.xml"/>
<date year="1997" month="March"/> <xi:include
<abstract> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<t>In many standards track documents several words are use nce.RFC.4566.xml"/>
d to signify the requirements in the specification. These words are oft <xi:include
en capitalized. This document defines these words as they should be inte href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
rpreted in IETF documents. This document specifies an Internet Best Cur nce.RFC.4579.xml"/>
rent Practices for the Internet Community, and requests discussion and s <xi:include
uggestions for improvements.</t> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
</abstract> nce.RFC.5239.xml"/>
</front> <xi:include
</reference> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<reference anchor="RFC3261" target="https://www.rfc-editor.org/i nce.RFC.5646.xml"/>
nfo/rfc3261" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/ <xi:include
reference.RFC.3261.xml"> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<front> nce.RFC.6350.xml"/>
<title>SIP: Session Initiation Protocol</title> <xi:include
<seriesInfo name="DOI" value="10.17487/RFC3261"/> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<seriesInfo name="RFC" value="3261"/> nce.RFC.6351.xml"/>
<author initials="J." surname="Rosenberg" fullname="J. Rosen <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibx
berg"> ml/reference.RFC.8174.xml"/>
<organization/>
</author>
<author initials="H." surname="Schulzrinne" fullname="H. Sch
ulzrinne">
<organization/>
</author>
<author initials="G." surname="Camarillo" fullname="G. Camar
illo">
<organization/>
</author>
<author initials="A." surname="Johnston" fullname="A. Johnst
on">
<organization/>
</author>
<author initials="J." surname="Peterson" fullname="J. Peters
on">
<organization/>
</author>
<author initials="R." surname="Sparks" fullname="R. Sparks">
<organization/>
</author>
<author initials="M." surname="Handley" fullname="M. Handley
">
<organization/>
</author>
<author initials="E." surname="Schooler" fullname="E. School
er">
<organization/>
</author>
<date year="2002" month="June"/>
<abstract>
<t>This document describes Session Initiation Protocol (SI
P), an application-layer control (signaling) protocol for creating, modi
fying, and terminating sessions with one or more participants. These se
ssions include Internet telephone calls, multimedia distribution, and mu
ltimedia conferences. [STANDARDS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC3264" target="https://www.rfc-editor.org/i
nfo/rfc3264" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.3264.xml">
<front>
<title>An Offer/Answer Model with Session Description Protoc
ol (SDP)</title>
<seriesInfo name="DOI" value="10.17487/RFC3264"/>
<seriesInfo name="RFC" value="3264"/>
<author initials="J." surname="Rosenberg" fullname="J. Rosen
berg">
<organization/>
</author>
<author initials="H." surname="Schulzrinne" fullname="H. Sch
ulzrinne">
<organization/>
</author>
<date year="2002" month="June"/>
<abstract>
<t>This document defines a mechanism by which two entities
can make use of the Session Description Protocol (SDP) to arrive at a c
ommon view of a multimedia session between them. In the model, one part
icipant offers the other a description of the desired session from their
perspective, and the other participant answers with the desired session
from their perspective. This offer/answer model is most useful in unic
ast sessions where information from both participants is needed for the
complete view of the session. The offer/answer model is used by protoco
ls like the Session Initiation Protocol (SIP). [STANDARDS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC3550" target="https://www.rfc-editor.org/i
nfo/rfc3550" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.3550.xml">
<front>
<title>RTP: A Transport Protocol for Real-Time Applications<
/title>
<seriesInfo name="DOI" value="10.17487/RFC3550"/>
<seriesInfo name="RFC" value="3550"/>
<seriesInfo name="STD" value="64"/>
<author initials="H." surname="Schulzrinne" fullname="H. Sch
ulzrinne">
<organization/>
</author>
<author initials="S." surname="Casner" fullname="S. Casner">
<organization/>
</author>
<author initials="R." surname="Frederick" fullname="R. Frede
rick">
<organization/>
</author>
<author initials="V." surname="Jacobson" fullname="V. Jacobs
on">
<organization/>
</author>
<date year="2003" month="July"/>
<abstract>
<t>This memorandum describes RTP, the real-time transport
protocol. RTP provides end-to-end network transport functions suitable
for applications transmitting real-time data, such as audio, video or si
mulation data, over multicast or unicast network services. RTP does not
address resource reservation and does not guarantee quality-of- service
for real-time services. The data transport is augmented by a control p
rotocol (RTCP) to allow monitoring of the data delivery in a manner scal
able to large multicast networks, and to provide minimal control and ide
ntification functionality. RTP and RTCP are designed to be independent
of the underlying transport and network layers. The protocol supports t
he use of RTP-level translators and mixers. Most of the text in this mem
orandum is identical to RFC 1889 which it obsoletes. There are no chang
es in the packet formats on the wire, only changes to the rules and algo
rithms governing how the protocol is used. The biggest change is an enha
ncement to the scalable timer algorithm for calculating when to send RTC
P packets in order to minimize transmission in excess of the intended ra
te when many participants join a session simultaneously. [STANDARDS-TRA
CK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC4566" target="https://www.rfc-editor.org/i
nfo/rfc4566" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.4566.xml">
<front>
<title>SDP: Session Description Protocol</title>
<seriesInfo name="DOI" value="10.17487/RFC4566"/>
<seriesInfo name="RFC" value="4566"/>
<author initials="M." surname="Handley" fullname="M. Handley
">
<organization/>
</author>
<author initials="V." surname="Jacobson" fullname="V. Jacobs
on">
<organization/>
</author>
<author initials="C." surname="Perkins" fullname="C. Perkins
">
<organization/>
</author>
<date year="2006" month="July"/>
<abstract>
<t>This memo defines the Session Description Protocol (SDP
). SDP is intended for describing multimedia sessions for the purposes
of session announcement, session invitation, and other forms of multimed
ia session initiation. [STANDARDS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC4579" target="https://www.rfc-editor.org/i
nfo/rfc4579" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.4579.xml">
<front>
<title>Session Initiation Protocol (SIP) Call Control - Conf
erencing for User Agents</title>
<seriesInfo name="DOI" value="10.17487/RFC4579"/>
<seriesInfo name="RFC" value="4579"/>
<seriesInfo name="BCP" value="119"/>
<author initials="A." surname="Johnston" fullname="A. Johnst
on">
<organization/>
</author>
<author initials="O." surname="Levin" fullname="O. Levin">
<organization/>
</author>
<date year="2006" month="August"/>
<abstract>
<t>This specification defines conferencing call control fe
atures for the Session Initiation Protocol (SIP). This document builds
on the Conferencing Requirements and Framework documents to define how a
tightly coupled SIP conference works. The approach is explored from th
e perspective of different user agent (UA) types: conference-unaware, co
nference-aware, and focus UAs. The use of Uniform Resource Identifiers
(URIs) in conferencing, OPTIONS for capabilities discovery, and call con
trol using REFER are covered in detail with example call flow diagrams.
The usage of the isfocus feature tag is defined. This document specifi
es an Internet Best Current Practices for the Internet Community, and re
quests discussion and suggestions for improvements.</t>
</abstract>
</front>
</reference>
<reference anchor="RFC5239" target="https://www.rfc-editor.org/i
nfo/rfc5239" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.5239.xml">
<front>
<title>A Framework for Centralized Conferencing</title>
<seriesInfo name="DOI" value="10.17487/RFC5239"/>
<seriesInfo name="RFC" value="5239"/>
<author initials="M." surname="Barnes" fullname="M. Barnes">
<organization/>
</author>
<author initials="C." surname="Boulton" fullname="C. Boulton
">
<organization/>
</author>
<author initials="O." surname="Levin" fullname="O. Levin">
<organization/>
</author>
<date year="2008" month="June"/>
<abstract>
<t>This document defines the framework for Centralized Con
ferencing. The framework allows participants using various call signalin
g protocols, such as SIP, H.323, Jabber, Q.931 or ISDN User Part (ISUP),
to exchange media in a centralized unicast conference. The Centralized
Conferencing Framework defines logical entities and naming conventions.
The framework also outlines a set of conferencing protocols, which are
complementary to the call signaling protocols, for building advanced co
nferencing applications. The framework binds all the defined components
together for the benefit of builders of conferencing systems. [STANDAR
DS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC5646" target="https://www.rfc-editor.org/i
nfo/rfc5646" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.5646.xml">
<front>
<title>Tags for Identifying Languages</title>
<seriesInfo name="DOI" value="10.17487/RFC5646"/>
<seriesInfo name="RFC" value="5646"/>
<seriesInfo name="BCP" value="47"/>
<author initials="A." surname="Phillips" fullname="A. Philli
ps" role="editor">
<organization/>
</author>
<author initials="M." surname="Davis" fullname="M. Davis" ro
le="editor">
<organization/>
</author>
<date year="2009" month="September"/>
<abstract>
<t>This document describes the structure, content, constru
ction, and semantics of language tags for use in cases where it is desir
able to indicate the language used in an information object. It also de
scribes how to register values for use in language tags and the creation
of user-defined extensions for private interchange. This document spe
cifies an Internet Best Current Practices for the Internet Community, an
d requests discussion and suggestions for improvements.</t>
</abstract>
</front>
</reference>
<reference anchor="RFC6350" target="https://www.rfc-editor.org/i
nfo/rfc6350" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.6350.xml">
<front>
<title>vCard Format Specification</title>
<seriesInfo name="DOI" value="10.17487/RFC6350"/>
<seriesInfo name="RFC" value="6350"/>
<author initials="S." surname="Perreault" fullname="S. Perre
ault">
<organization/>
</author>
<date year="2011" month="August"/>
<abstract>
<t>This document defines the vCard data format for represe
nting and exchanging a variety of information about individuals and othe
r entities (e.g., formatted and structured name and delivery addresses,
email address, multiple telephone numbers, photograph, logo, audio clips
, etc.). This document obsoletes RFCs 2425, 2426, and 4770, and updates
RFC 2739. [STANDARDS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC6351" target="https://www.rfc-editor.org/i
nfo/rfc6351" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.6351.xml">
<front>
<title>xCard: vCard XML Representation</title>
<seriesInfo name="DOI" value="10.17487/RFC6351"/>
<seriesInfo name="RFC" value="6351"/>
<author initials="S." surname="Perreault" fullname="S. Perre
ault">
<organization/>
</author>
<date year="2011" month="August"/>
<abstract>
<t>This document defines the XML schema of the vCard data
format. [STANDARDS-TRACK]</t>
</abstract>
</front>
</reference>
<reference anchor="RFC8174" target="https://www.rfc-editor.org/i
nfo/rfc8174">
<front>
<title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key W
ords</title>
<seriesInfo name="DOI" value="10.17487/RFC8174"/>
<seriesInfo name="RFC" value="8174"/>
<seriesInfo name="BCP" value="14"/>
<author initials="B." surname="Leiba" fullname="B. Leiba">
<organization/>
</author>
<date year="2017" month="May"/>
<abstract>
<t>RFC 2119 specifies common key words that may be used in
protocol specifications. This document aims to reduce the ambiguity b
y clarifying that only UPPERCASE usage of the key words have the define
d special meanings.</t>
</abstract>
</front>
</reference>
</references> </references>
<references> <references>
<name>Informative References</name> <name>Informative References</name>
<!-- &I-D.ietf-clue-rtp-mapping; MISSREF--> <!-- &I-D.ietf-clue-rtp-mapping; MISSREF-->
<reference anchor="RFCYYY4" target="http://www.rfc-editor.org/info/rfcYY Y4"> <reference anchor="RFCYYY4" target="http://www.rfc-editor.org/info/rfcYY Y4">
<front> <front>
<title>Mapping RTP Streams to Controlling Multiple Streams f or Telepresence (CLUE) Media Captures</title> <title>Mapping RTP Streams to Controlling Multiple Streams f or Telepresence (CLUE) Media Captures</title>
<seriesInfo name="DOI" value="10.17487/RFCYYY4"/> <seriesInfo name="DOI" value="10.17487/RFCYYY4"/>
<seriesInfo name="RFC" value="YYY4"/> <seriesInfo name="RFC" value="YYY4"/>
skipping to change at line 3976 skipping to change at line 3918
</author> </author>
<author initials="J" surname="Lennox" fullname="Jonathan Len nox"> <author initials="J" surname="Lennox" fullname="Jonathan Len nox">
<organization/> <organization/>
</author> </author>
<date month="February" day="27" year="2017"/> <date month="February" day="27" year="2017"/>
<abstract> <abstract>
<t>This document describes how the Real Time transport Pro tocol (RTP) is used in the context of the CLUE protocol (ControLling mUl tiple streams for tElepresence). It also describes the mechanisms and r ecommended practice for mapping RTP media streams defined in Session Des cription Protocol (SDP) to CLUE Media Captures and defines a new RTP hea der extension (CaptureId).</t> <t>This document describes how the Real Time transport Pro tocol (RTP) is used in the context of the CLUE protocol (ControLling mUl tiple streams for tElepresence). It also describes the mechanisms and r ecommended practice for mapping RTP media streams defined in Session Des cription Protocol (SDP) to CLUE Media Captures and defines a new RTP hea der extension (CaptureId).</t>
</abstract> </abstract>
</front> </front>
</reference> </reference>
<reference anchor="RFC4353" target="https://www.rfc-editor.org/i
nfo/rfc4353" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.4353.xml">
<front>
<title>A Framework for Conferencing with the Session Initiat
ion Protocol (SIP)</title>
<seriesInfo name="DOI" value="10.17487/RFC4353"/>
<seriesInfo name="RFC" value="4353"/>
<author initials="J." surname="Rosenberg" fullname="J. Rosen
berg">
<organization/>
</author>
<date year="2006" month="February"/>
<abstract>
<t>The Session Initiation Protocol (SIP) supports the init
iation, modification, and termination of media sessions between user age
nts. These sessions are managed by SIP dialogs, which represent a SIP re
lationship between a pair of user agents. Because dialogs are between p
airs of user agents, SIP's usage for two-party communications (such as a
phone call), is obvious. Communications sessions with multiple partici
pants, generally known as conferencing, are more complicated. This docu
ment defines a framework for how such conferencing can occur. This fram
ework describes the overall architecture, terminology, and protocol comp
onents needed for multi-party conferencing. This memo provides informat
ion for the Internet community.</t>
</abstract>
</front>
</reference>
<reference anchor="RFC7667" target="https://www.rfc-editor.org/i
nfo/rfc7667" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.7667.xml">
<front>
<title>RTP Topologies</title>
<seriesInfo name="DOI" value="10.17487/RFC7667"/>
<seriesInfo name="RFC" value="7667"/>
<author initials="M." surname="Westerlund" fullname="M. West
erlund">
<organization/>
</author>
<author initials="S." surname="Wenger" fullname="S. Wenger">
<organization/>
</author>
<date year="2015" month="November"/>
<abstract>
<t>This document discusses point-to-point and multi-endpoi
nt topologies used in environments based on the Real-time Transport Prot
ocol (RTP). In particular, centralized topologies commonly employed in t
he video conferencing industry are mapped to the RTP terminology.</t>
</abstract>
</front>
</reference>
<!-- [rfced] The following RFC has been obsoleted. We have upda
ted this reference as follows. Please let us know any objections.
RFC 5117 has become RFC 7667 <xi:include
href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
nce.RFC.4353.xml"/>
<xi:include
href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
nce.RFC.7667.xml"/>
<!-- [rfced] The following RFC has been obsoleted. We have updated
this reference as follows. Please let us know any objections.
RFC 5117 has been obsoleted by RFC 7667.
--> -->
<reference anchor="RFC7201" target="https://www.rfc-editor.org/in <xi:include
fo/rfc7201" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/r href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
eference.RFC.7201.xml"> nce.RFC.7201.xml"/>
<front> <xi:include
<title>Options for Securing RTP Sessions</title> href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
<seriesInfo name="DOI" value="10.17487/RFC7201"/> nce.RFC.7202.xml"/>
<seriesInfo name="RFC" value="7201"/> <xi:include
<author initials="M." surname="Westerlund" fullname="M. West href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refere
erlund"> nce.RFC.7205.xml"/>
<organization/> <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibx
</author> ml/reference.RFC.7262.xml"/>
<author initials="C." surname="Perkins" fullname="C. Perkins
">
<organization/>
</author>
<date year="2014" month="April"/>
<abstract>
<t>The Real-time Transport Protocol (RTP) is used in a lar
ge number of different application domains and environments. This heter
ogeneity implies that different security mechanisms are needed to provid
e services such as confidentiality, integrity, and source authentication
of RTP and RTP Control Protocol (RTCP) packets suitable for the various
environments. The range of solutions makes it difficult for RTP-based
application developers to pick the most suitable mechanism. This docume
nt provides an overview of a number of security solutions for RTP and gi
ves guidance for developers on how to choose the appropriate security me
chanism.</t>
</abstract>
</front>
</reference>
<reference anchor="RFC7202" target="https://www.rfc-editor.org/i
nfo/rfc7202" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.7202.xml">
<front>
<title>Securing the RTP Framework: Why RTP Does Not Mandate
a Single Media Security Solution</title>
<seriesInfo name="DOI" value="10.17487/RFC7202"/>
<seriesInfo name="RFC" value="7202"/>
<author initials="C." surname="Perkins" fullname="C. Perkins
">
<organization/>
</author>
<author initials="M." surname="Westerlund" fullname="M. West
erlund">
<organization/>
</author>
<date year="2014" month="April"/>
<abstract>
<t>This memo discusses the problem of securing real-time m
ultimedia sessions. It also explains why the Real-time Transport Protoc
ol (RTP) and the associated RTP Control Protocol (RTCP) do not mandate a
single media security mechanism. This is relevant for designers and re
viewers of future RTP extensions to ensure that appropriate security mec
hanisms are mandated and that any such mechanisms are specified in a man
ner that conforms with the RTP architecture.</t>
</abstract>
</front>
</reference>
<reference anchor="RFC7205" target="https://www.rfc-editor.org/i
nfo/rfc7205" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.7205.xml">
<front>
<title>Use Cases for Telepresence Multistreams</title>
<seriesInfo name="DOI" value="10.17487/RFC7205"/>
<seriesInfo name="RFC" value="7205"/>
<author initials="A." surname="Romanow" fullname="A. Romanow
">
<organization/>
</author>
<author initials="S." surname="Botzko" fullname="S. Botzko">
<organization/>
</author>
<author initials="M." surname="Duckworth" fullname="M. Duckw
orth">
<organization/>
</author>
<author initials="R." surname="Even" fullname="R. Even" role
="editor">
<organization/>
</author>
<date year="2014" month="April"/>
<abstract>
<t>Telepresence conferencing systems seek to create an env
ironment that gives users (or user groups) that are not co-located a fee
ling of co-located presence through multimedia communication that includ
es at least audio and video signals of high fidelity. A number of techn
iques for handling audio and video streams are used to create this exper
ience. When these techniques are not similar, interoperability between
different systems is difficult at best, and often not possible. Conveyi
ng information about the relationships between multiple streams of media
would enable senders and receivers to make choices to allow telepresenc
e systems to interwork. This memo describes the most typical and import
ant use cases for sending multiple streams in a telepresence conference.
</t>
</abstract>
</front>
</reference>
<reference anchor="RFC7262" target="https://www.rfc-editor.org/i
nfo/rfc7262" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/
reference.RFC.7262.xml">
<front>
<title>Requirements for Telepresence Multistreams</title>
<seriesInfo name="DOI" value="10.17487/RFC7262"/>
<seriesInfo name="RFC" value="7262"/>
<author initials="A." surname="Romanow" fullname="A. Romanow
">
<organization/>
</author>
<author initials="S." surname="Botzko" fullname="S. Botzko">
<organization/>
</author>
<author initials="M." surname="Barnes" fullname="M. Barnes">
<organization/>
</author>
<date year="2014" month="June"/>
<abstract>
<t>This memo discusses the requirements for specifications
that enable telepresence interoperability by describing behaviors and p
rotocols for Controlling Multiple Streams for Telepresence (CLUE). In a
ddition, the problem statement and related definitions are also covered
herein.</t>
</abstract>
</front>
</reference>
</references> </references>
</references> </references>
<section anchor="acks" numbered="false" toc="default"> <section anchor="acks" numbered="false" toc="default">
<name>Acknowledgements</name> <name>Acknowledgements</name>
<t> <t>
Allyn Romanow and Brian Baldino were authors of early versions. Allyn Romanow and Brian Baldino were authors of early versions.
Mark Gorzynski also contributed much to the initial approach. Mark Gorzynski also contributed much to the initial approach.
Many others also contributed, including Christian Groves, Jonathan Many others also contributed, including Christian Groves, Jonathan
Lennox, Paul Kyzivat, Rob Hansen, Roni Even, Christer Holmberg, Lennox, Paul Kyzivat, Rob Hansen, Roni Even, Christer Holmberg,
Stephen Botzko, Mary Barnes, John Leslie, Paul Coverdale.</t> Stephen Botzko, Mary Barnes, John Leslie, Paul Coverdale.</t>
 End of changes. 142 change blocks. 
1113 lines changed or deleted 821 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/