index.bs

<pre class='metadata'>
Title: MediaStreamTrack Insertable Media Processing using Streams
Shortname: mediacapture-insertable-streams
Level: None
Status: UD
Group: webrtc
Repository: w3c/mediacapture-insertable-streams
URL: https://w3c.github.io/mediacapture-insertable-streams/
Editor: Harald Alvestrand, Google https://google.com, hta@google.com
Editor: Guido Urdaneta, Google https://google.com, guidou@google.com
Abstract: This API defines an API surface for manipulating the bits on
Abstract: {{MediaStreamTrack}}s carrying raw data.
Abstract: NOT AN ADOPTED WORKING GROUP DOCUMENT.
Markup Shorthands: css no, markdown yes
</pre>
<pre class=anchors>
url: https://wicg.github.io/web-codecs/#videoframe; text: VideoFrame; type: interface; spec: WEBCODECS
url: https://wicg.github.io/web-codecs/#videoencoder; text: VideoEncoder; type: interface; spec: WEBCODECS
url: https://wicg.github.io/web-codecs/#audiodata; text: AudioData; type: interface; spec: WEBCODECS
url: https://www.w3.org/TR/mediacapture-streams/#mediastreamtrack; text: MediaStreamTrack; type: interface; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constrainulong; text: ConstrainULong; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constraindouble; text: ConstrainDouble; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constraindomstring; text: ConstrainDOMString; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-videoresizemodeenum; text: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://w3c.github.io/mediacapture-main/#idl-def-VideoResizeModeEnum.user; text: none; for: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://w3c.github.io/mediacapture-main/#idl-def-VideoResizeModeEnum.right; text: crop-and-scale; for: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://infra.spec.whatwg.org/#queues; text: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#queue-enqueue; text: enqueue; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#queue-dequeue; text: dequeue; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#list-is-empty; text: empty; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#booleans; text: Boolean; type: typedef; spec: INFRA
url: https://www.w3.org/TR/mediacapture-streams/#source-stopped; text: StopSource; type: typedef; spec: MEDIACAPTURE-STREAM
url: https://www.w3.org/TR/mediacapture-streams/#track-ended; text: TrackEnded; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://streams.spec.whatwg.org/#readable-stream-default-controller-close; text: ReadableStreamDefaultControllerClose; type: typedef; spec: STREAMS
url: https://streams.spec.whatwg.org/#readablestream-controller; text: ReadableStreamControllerSlot; type: typedef; spec: STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#ends-nostop; text: EndTrack; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-overconstrainederror; text: OverconstrainedError; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://infra.spec.whatwg.org/#list-empty; text: Empty; for: List; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#list-remove; text: remove; for: List; type: typedef; spec: INFRA
</pre>
<pre class=biblio>
{
  "WEBCODECS": {
     "href":
     "https://wicg.github.io/web-codecs/",
     "title": "WebCodecs"
   },
  "MEDIACAPTURE-SCREEN-SHARE": {
    "href": "https://w3c.github.io/mediacapture-screen-share/",
    "title": "Screen Capture"
  },
  "MEDIACAPTURE-STREAMS": {
    "href": "https://www.w3.org/TR/mediacapture-streams/",
    "title": "Media Capture and Streams"
  },
  "INFRA": {
    "href": "https://https://infra.spec.whatwg.org",
    "title": "Infra"
  },
  "STREAMS": {
    "href": "https://streams.spec.whatwg.org",
    "title": "Streams"
  },
  "WEBAUDIO": {
    "href": "https://www.w3.org/TR/webaudio/",
    "title": "Web Audio API"
  },
  "WEBTRANSPORT": {
    "href": "https://www.w3.org/TR/webtransport/",
    "title": "WebTransport"
  }

}
</pre>
<pre class=link-defaults>
spec:streams; type:interface; text:WritableStream
</pre>

# Introduction # {#introduction}

The [[WEBRTC-NV-USE-CASES]] document describes several functions that
can only be achieved by access to media (requirements N20-N22),
including, but not limited to:
* Funny Hats
* Machine Learning
* Virtual Reality Gaming

These use cases further require that processing can be done in worker
threads (requirement N23-N24).

This specification gives an interface based on [[WEBCODECS]] and [[STREAMS]] to
provide access to such functionality.

This specification provides access to raw media,
which is the output of a media source such as a camera, microphone, screen capture,
or the decoder part of a codec and the input to the
decoder part of a codec. The processed media can be consumed by any destination
that can take a MediaStreamTrack, including HTML &lt;video&gt; and &lt;audio&gt; tags,
RTCPeerConnection, canvas or MediaRecorder.

This specification explicitly aims to support the following use cases:
- *Video processing*: This is the "Funny Hats" use case, where the input is a single video track and the output is a transformed video track.
- *Audio processing*: This is the equivalent of the video processing use case, but for audio tracks. This use case overlaps partially with the {{AudioWorklet}} interface, but the model provided by this specification differs in significant ways:
    - Pull-based programming model, as opposed to {{AudioWorklet}}'s clock-based model. This means that processing of each single block of audio data does not have a set time budget.
    - Offers direct access to the data and metadata from the original {{MediaStreamTrack}}. In particular, timestamps come directly from the track as opposed to an {{AudioContext}}.
    - Easier integration with video processing by providing the same API and programming model and allowing both to run on the same scope.
    - Does not run on a real-time thread. This means that the model is not suitable for applications with strong low-latency requirements.

    These differences make the model provided by this specification more
    suitable than {{AudioWorklet}} for processing that requires more tolerance
    to transient CPU spikes, better integration with video
    {{MediaStreamTrack}}s, access to track metadata (e.g., timestamps), but
    not strong low-latency requirements such as local audio rendering.

    An example of this would be <a href="https://arxiv.org/abs/1804.03619">
    audio-visual speech separation</a>, which can be used to combine the video
    and audio tracks from a speaker on the sender side of a video call and
    remove noise not coming from the speaker (i.e., the "Noisy cafeteria" case).
    Other examples that do not require integration with video but can benefit
    from the model include echo detection and other forms of ML-based noise
    cancellation.
  - *Multi-source processing*: In this use case, two or more tracks are combined into one. For example, a presentation containing a live weather map and a camera track with the speaker can be combined to produce a weather report application. Audio-visual speech separation, referenced above, is another case of multi-source processing.
  - *Custom audio or video sink*: In this use case, the purpose is not producing a processed {{MediaStreamTrack}}, but to consume the media in a different way. For example, an application could use [[WEBCODECS]] and [[WEBTRANSPORT]] to create an {{RTCPeerConnection}}-like sink, but using different codec configuration and networking protocols.

# Specification # {#specification}

This specification shows the IDL extensions for [[MEDIACAPTURE-STREAMS]].
It defines some new objects that inherit the {{MediaStreamTrack}} interface, and
can be constructed from a {{MediaStreamTrack}}.

The API consists of two elements. One is a track sink that is
capable of exposing the unencoded media frames from the track to a ReadableStream.
The other one is the inverse of that: it provides a track source that takes
media frames as input.

<!-- ## Extension operation ## {#operation} -->

## MediaStreamTrackProcessor ## {#track-processor}

A {{MediaStreamTrackProcessor}} allows the creation of a
{{ReadableStream}} that can expose the media flowing through
a given {{MediaStreamTrack}}. If the {{MediaStreamTrack}} is a video track,
the chunks exposed by the stream will be {{VideoFrame}} objects;
if the track is an audio track, the chunks will be {{AudioData}} objects.
This makes {{MediaStreamTrackProcessor}} effectively a sink in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a>.

A {{MediaStreamTrackProcessor}} internally contains a circular queue
that allows buffering incoming media frames delivered by the track it
is connected to. This buffering allows the {{MediaStreamTrackProcessor}}
to temporarily hold frames waiting to be read from its associated {{ReadableStream}}.
The application can influence the maximum size of the queue via a parameter
provided in the {{MediaStreamTrackProcessor}} constructor. However, the
maximum size of the queue is decided by the UA and can change dynamically,
but it will not exceed the size requested by the application.
If the application does not provide a maximum size parameter, the UA is free
to decide the maximum size of the queue.

When a new frame arrives to the
{{MediaStreamTrackProcessor}}, if the queue has reached its maximum size,
the oldest frame will be removed from the queue, and the new frame will be
added to the queue. This means that for the particular case of a queue
with a maximum size of 1, if there is a queued frame, it will aways be
the most recent one.

The UA is also free to remove any frames from the queue at any time. The UA
may remove frames in order to save resources or to improve performance in
specific situations. In all cases, frames that are not dropped
must be made available to the {{ReadableStream}} in the order in which
they arrive to the {{MediaStreamTrackProcessor}}.

A {{MediaStreamTrackProcessor}} makes frames available to its
associated {{ReadableStream}} only when a read request has been issued on
the stream. The idea is to avoid the stream's internal buffering, which
does not give the UA enough flexibility to choose the buffering policy.

### Interface definition ### {#track-processor-interface}

<pre class="idl">
[Exposed=Window,DedicatedWorker]
interface MediaStreamTrackProcessor {
    constructor(MediaStreamTrackProcessorInit init);
    attribute ReadableStream readable;
};

dictionary MediaStreamTrackProcessorInit {
  required MediaStreamTrack track;
  [EnforceRange] unsigned short maxBufferSize;
};
</pre>

### Internal slots ### {#internal-slots-processor}
<dl>
<dt><dfn for=MediaStreamTrackProcessor>`[[track]]`</dfn></dt>
<dd>Track whose raw data is to be exposed by the {{MediaStreamTrackProcessor}}.</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[maxBufferSize]]`</dfn></dt>
<dd>The maximum number of media frames to be buffered by the {{MediaStreamTrackProcessor}}
as specified by the application. It may have no value if the application does
not provide it. Its minimum valid value is 1.</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[queue]]`</dfn></dt>
<dd>A {{Queue|queue}} used to buffer media frames not yet read by the application</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[numPendingReads]]`</dfn></dt>
<dd>An integer whose value represents the number of read requests issued by the
application that have not yet been handled.
</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[isClosed]]`</dfn></dt>
<dd>An boolean whose value indicates if the {{MediaStreamTrackProcessor}} is closed.
</dd>
</dl>

### Constructor ### {#constructor-processor}
<dfn constructor for=MediaStreamTrackProcessor title="MediaStreamTrackProcessor(init)">
  MediaStreamTrackProcessor(|init|)
</dfn>
1. If |init|.{{MediaStreamTrackProcessorInit/track}} is not a valid {{MediaStreamTrack}},
    throw a {{TypeError}}.
2. Let |processor| be a new {{MediaStreamTrackProcessor}} object.
3. Assign |init|.{{MediaStreamTrackProcessorInit/track}} to |processor|.`[[track]]`.
4. If |init|.{{MediaStreamTrackProcessorInit/maxBufferSize}} has a integer value greater than or equal to 1, assign it to |processor|.`[[maxBufferSize]]`.
6. Set the `[[queue]]` internal slot of |processor| to an empty {{Queue}}.
7. Set |processor|.`[[numPendingReads]]` to 0.
8. Set |processor|.`[[isClosed]]` to false.
9. Return |processor|.

### Attributes ### {#attributes-processor}
<dl>
<dt><dfn for=MediaStreamTrackProcessor>readable</dfn></dt>
<dd>Allows reading the frames delivered by the {{MediaStreamTrack}} stored
in the `[[track]]` internal slot. This attribute is created the first time it is invoked
according to the following steps:
1. Initialize [=this=].{{MediaStreamTrackProcessor/readable}} to be a new {{ReadableStream}}.
2. <a dfn for="ReadableStream">Set up</a> [=this=].{{MediaStreamTrackProcessor/readable}} with its [=ReadableStream/set up/pullAlgorithm=] set to [=processorPull=] with [=this=] as parameter, [=ReadableStream/set up/cancelAlgorithm=] set to [=processorCancel=] with [=this=] as parameter, and [=ReadableStream/set up/highWatermark=] set to 0.

The <dfn>processorPull</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. Increment the value of the |processor|.`[[numPendingReads]]` by 1.
2. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.
3. Return  [=a promise resolved with=] undefined.

The <dfn>maybeReadFrame</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. If |processor|.`[[queue]]` is {{Queue/empty}}, abort these steps.
2. If |processor|.`[[numPendingReads]]` equals zero, abort these steps.
3. {{Queue/dequeue}} a frame from |processor|.`[[queue]]` and [=ReadableStream/Enqueue=] it in |processor|.{{MediaStreamTrackProcessor/readable}}.
4. Decrement |processor|.`[[numPendingReads]]` by 1.
5. Go to step 1.

The <dfn>processorCancel</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. Run the [=processorClose=] algorithm with |processor| as parameter.
3. Return  [=a promise resolved with=] undefined.

The <dfn>processorClose</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[isClosed]]` is true, abort these steps.
2. Disconnect |processor| from |processor|.`[[track]]`. The mechanism to do this is UA specific and the result is that |processor| is no longer a sink of |processor|.`[[track]]`.
3. {{ReadableStreamDefaultControllerClose|Close}} |processor|.{{MediaStreamTrackProcessor/readable}}.{{ReadableStreamControllerSlot|[[controller]]}}.
4. {{List/Empty}} |processor|.`[[queue]]`.
5. Set |processor|.`[[isClosed]]` to true.

</dd>
</dl>

### Handling interaction with the track ### {#processor-handling-interaction-with-track}
When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor| delivers a
frame to |processor|, the UA MUST execute the [=handleNewFrame=] algorithm
with |processor| as parameter.

The <dfn>handleNewFrame</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[maxBufferSize]]` has a value and |processor|.`[[queue]]` has |processor|.`[[maxBufferSize]]` elements, {{Queue/dequeue}} an item from |processor|.`[[queue]]`.
2. {{Queue/enqueue}} the new frame in |processor|.`[[queue]]`.
3. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.

At any time, the UA MAY {{List/remove}} any frame from |processor|.`[[queue]]`.
The UA may decide to remove frames from |processor|.`[[queue]]`, for example,
to prevent resource exhaustion or to improve performance in certain situations.
</dd>

<p class="note">
The application may detect that frames have been dropped by noticing that there
is a gap in the timestamps of the frames.
</p>
</dl>

When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor|
{{TrackEnded|ends}}, the [=processorClose=] algorithm must be
executed with |processor| as parameter.

## MediaStreamTrackGenerator ## {#generator}
A {{MediaStreamTrackGenerator}} allows the creation of a {{WritableStream}}
that acts as a {{MediaStreamTrack}} source in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a>. Since the model does not expose sources directly
but through the tracks connected to it, a {{MediaStreamTrackGenerator}}
is also a track connected to its {{WritableStream}} source. Further tracks
connected to the same {{WritableStream}} can be created using the
{{MediaStreamTrack/clone}} method. The {{WritableStream}} source is
exposed as the {{MediaStreamTrackGenerator/writable}} field of
{{MediaStreamTrackGenerator}}.

Similarly to {{MediaStreamTrackProcessor}}, the {{WritableStream}} of
an audio {{MediaStreamTrackGenerator}} accepts {{AudioData}} objects,
and a video {{MediaStreamTrackGenerator}} accepts {{VideoFrame}} objects.
When a {{VideoFrame}} or {{AudioData}} object is written to
{{MediaStreamTrackGenerator/writable}},
the frame's `close()` method is automatically invoked, so that its internal
resources are no longer accessible from JavaScript.

### Interface definition ### {#generator-interface}
<pre class="idl">
[Exposed=Window,DedicatedWorker]
interface MediaStreamTrackGenerator : MediaStreamTrack {
    constructor(MediaStreamTrackGeneratorInit init);
    attribute WritableStream writable;  // VideoFrame or AudioFrame
};

dictionary MediaStreamTrackGeneratorInit {
  required DOMString kind;
};
</pre>

### Constructor ### {#generator-constructor}
<dfn constructor for=MediaStreamTrackGenerator title="MediaStreamTrackGenerator(init)">
  MediaStreamTrackGenerator(init)
</dfn>
1. If |init|.{{MediaStreamTrackGeneratorInit/kind}} is not `"audio"` or `"video"`,
    throw a {{TypeError}}.
2. Let |g| be a new {{MediaStreamTrackGenerator}} object.
3. Initialize the {{MediaStreamTrack/kind}} field of |g| (inherited from {{MediaStreamTrack}})
    with |init|.{{MediaStreamTrackGeneratorInit/kind}}.
4. Return |g|.

### Attributes ### {#generator-attributes}
<dl>
<dt><dfn attribute for=MediaStreamTrackGenerator>writable</dfn></dt>
<dd>Allows writing media frames to the {{MediaStreamTrackGenerator}}, which is
itself a {{MediaStreamTrack}}. When this attribute is accessed for the first time,
it MUST be initialized with the following steps:
1. Initialize [=this=].{{MediaStreamTrackGenerator/writable}} to be a new {{WritableStream}}.
2. <a dfn for="WritableStream">Set up</a> [=this=].{{MediaStreamTrackGenerator/writable}}, with its [=WritableStream/set up/writeAlgorithm=] set to [=writeFrame=] with |this| as parameter, with [=WritableStream/set up/closeAlgorithm=] set to [=closeWritable=] with |this| as parameter and [=WritableStream/set up/abortAlgorithm=] set to [=closeWritable=] with |this| as parameter.

The <dfn>writeFrame</dfn> algorithm is given a |generator| and a |frame| as input. It is defined by running the following steps:
1. If |generator|.{{MediaStreamTrack/kind}} equals `video` and |frame| is not a {{VideoFrame}} object, return [=a promise rejected with=] a {{TypeError}}.
2. If |generator|.{{MediaStreamTrack/kind}} equals `audio` and |frame| is not an {{AudioData}} object, return [=a promise rejected with=] a {{TypeError}}.
3. Send the media data backing |frame| to all live tracks connected to |generator|, possibly including |generator| itself.
4. Invoke the `close` method of |frame|.
5. Return  [=a promise resolved with=] undefined.

<p class="note">
When the media data is sent to a track, the UA may apply processing
(e.g., cropping and downscaling) to ensure that the media data sent
to the track satisfies the track's constraints. Each track may receive a
different version of the media data depending on its constraints.
</p>

The <dfn>closeWritable</dfn> algorithm is given a |generator| as input.
It is defined by running the following steps.
1. For each track `t` connected to |generator|, {{EndTrack|end}} `t`.
2. Return [=a promise resolved with=] undefined.

</dd>
</dl>

### Specialization of MediaStreamTrack behavior ### {#generator-as-track}
A {{MediaStreamTrackGenerator}} is a {{MediaStreamTrack}}. This section adds
clarifications on how a {{MediaStreamTrackGenerator}} behaves as a
{{MediaStreamTrack}}.

#### clone #### {#generator-clone}
The {{MediaStreamTrack/clone}} method on a {{MediaStreamTrackGenerator}}
returns a new {{MediaStreamTrack}} object whose source is the
same as the one for the {{MediaStreamTrackGenerator}} being cloned.
This source is the {{MediaStreamTrackGenerator/writable}} field of
the {{MediaStreamTrackGenerator}}.

#### stop #### {#generator-stop}
The {{MediaStreamTrack/stop}} method on a {{MediaStreamTrackGenerator}} stops
the track. When the last track connected to
the {{MediaStreamTrackGenerator/writable}} of a {{MediaStreamTrackGenerator}}
ends, its {{MediaStreamTrackGenerator/writable}} is [=WritableStream/closing|closed=].

#### Constrainable properties #### {#generator-constrainable-properties}

The following constrainable properties are defined for video
{{MediaStreamTrackGenerator}}s and any {{MediaStreamTrack}}s sourced from
a {{MediaStreamTrackGenerator}}:
<table>
  <thead>
    <tr>
      <th>
        Property Name
      </th>
      <th>
        Values
      </th>
      <th>
        Notes
      </th>
    </tr>
  </thead>
  <tbody>
    <tr id="def-constraint-width">
      <td data-tests="">
        <dfn>width</dfn>
      </td>
      <td>
        {{ConstrainULong}}
      </td>
      <td>
        As a setting, this is the width, in pixels, of the latest
        frame received by the track.
        As a capability, `max` MUST reflect the
        largest width a {{VideoFrame}} may have, and `min`
        MUST reflect the smallest width a {{VideoFrame}} may have.
      </td>
    </tr>
    <tr id="def-constraint-height">
      <td data-tests="">
        <dfn>height</dfn>
      </td>
      <td>
        {{ConstrainULong}}
      </td>
      <td>
        As a setting, this is the height, in pixels, of the latest
        frame received by the track.
        As a capability, `max` MUST reflect the largest height
        a {{VideoFrame}} may have, and `min` MUST reflect
        the smallest height a {{VideoFrame}} may have.
      </td>
    </tr>
    <tr id="def-constraint-frameRate">
      <td data-tests="">
        <dfn>frameRate</dfn>
      </td>
      <td>
        {{ConstrainDouble}}
      </td>
      <td>
        As a setting, this is an estimate of the frame rate based on frames
        recently received by the track.
        As a capability `min` MUST be zero and
        `max` MUST be the maximum frame rate supported by the system.
      </td>
    </tr>
    <tr id="def-constraint-aspect">
      <td data-tests="">
        <dfn>aspectRatio</dfn>
      </td>
      <td>
        {{ConstrainDouble}}
      </td>
      <td>
        As a setting, this is the aspect ratio of the latest frame
        delivered by the track;
        this is the width in pixels divided by height in pixels as a
        double rounded to the tenth decimal place. As a capability,
        `min` MUST be the
        smallest aspect ratio supported by a {{VideoFrame}}, and `max` MUST be
        the largest aspect ratio supported by a {{VideoFrame}}.
      </td>
    </tr>
    <tr id="def-constraint-resizeMode">
      <td data-tests="">
        <dfn>resizeMode</dfn>
      </td>
      <td>
        {{ConstrainDOMString}}
      </td>
      <td>
        As a setting, this string should be one of the members of
        {{VideoResizeModeEnum}}. The value "{{VideoResizeModeEnum/none}}"
        means that the frames output by the MediaStreamTrack are unmodified
        versions of the frames written to the
        {{MediaStreamTrackGenerator/writable}} backing
        the track, regardless of any constraints.
        The value "{{VideoResizeModeEnum/crop-and-scale}}" means
        that the frames output by the MediaStreamTrack may be cropped and/or
        downscaled versions
        of the source frames, based on the values of the width, height and
        aspectRatio constraints of the track.
        As a capability, the values "{{VideoResizeModeEnum/none}}" and
        "{{VideoResizeModeEnum/crop-and-scale}}" both MUST be present.
      </td>
    </tr>
  </tbody>
</table>

The {{MediaStreamTrack/applyConstraints}} method applied to a video {{MediaStreamTrack}}
sourced from a {{MediaStreamTrackGenerator}} supports the properties defined above.
It can be used, for example, to resize frames or adjust the frame rate of the track.
Note that these constraints have no effect on the {{VideoFrame}} objects
written to the {{MediaStreamTrackGenerator/writable}} of a {{MediaStreamTrackGenerator}},
just on the output of the track on which the constraints have been applied.
Note also that, since a {{MediaStreamTrackGenerator}} can in principle produce
media data with any setting for the supported constrainable properties,
an {{MediaStreamTrack/applyConstraints}} call on a track
backed by a {{MediaStreamTrackGenerator}} will generally not fail with
{{OverconstrainedError}} unless the given constraints
are outside the system-supported range, as reported by
{{MediaStreamTrack/getCapabilities}}.

The following constrainable properties are defined for audio {{MediaStreamTrack}}s
sourced from a {{MediaStreamTrackGenerator}}, but in an informational capacity
only available via {{MediaStreamTrack/getSettings}}. It is not possible to
reconfigure an audio track sourced by a {{MediaStreamTrackGenerator}} using
{{MediaStreamTrack/applyConstraints}} in the same way that it is possble to,
for example, resize the frames of a video track.
{{MediaStreamTrack/getCapabilities}} MUST return an empty object.
<table>
  <thead>
    <tr>
      <th>
        Property Name
      </th>
      <th>
        Values
      </th>
      <th>
        Notes
      </th>
    </tr>
  </thead>
  <tbody>
    <tr id="def-constraint-sampleRate">
      <td data-tests="">
        <dfn>sampleRate</dfn>
      </td>
      <td>
        {{ConstrainDouble}}
      </td>
      <td>
        As a setting, this is the sample rate, in samples per second, of the
        latest {{AudioData}} delivered by the track.
      </td>
    </tr>
    <tr id="def-constraint-channelCount">
      <td data-tests="">
        <dfn>channelCount</dfn>
      </td>
      <td>
        {{ConstrainULong}}
      </td>
      <td>
        As a setting, this is the number of independent audio channels of the
        latest {{AudioData}} delivered by the track.
      </td>
    </tr>
    <tr id="def-constraint-sampleSize">
      <td data-tests="">
        <dfn>sampleSize</dfn>
      </td>
      <td>
        {{ConstrainULong}}
      </td>
      <td>
        As a setting, this is the linear sample size of the latest {{AudioData}}
        delivered by the track.
      </td>
    </tr>
  </tbody>
</table>

#### Events and attributes #### {#generator-events-attributes}
Events and attributes work the same as for any {{MediaStreamTrack}}.
It is relevant to note that if the {{MediaStreamTrackGenerator/writable}}
stream of a {{MediaStreamTrackGenerator}} is closed, all the live
tracks connected to it, possibly including the {{MediaStreamTrackGenerator}}
itself, are ended and the `ended` event is fired on them.

# Examples # {#examples}
## Video Processing
Consider a face recognition function `detectFace(videoFrame)` that returns a face position
(in some format), and a manipulation function `blurBackground(videoFrame, facePosition)` that
returns a new VideoFrame similar to the given `videoFrame`, but with the
non-face parts blurred. The example also shows the video before and after
effects on video elements.

<pre class="example">
const stream = await getUserMedia({video:true});
const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const generator = new MediaStreamTrackGenerator({kind: 'video'});
const transformer = new TransformStream({
   async transform(videoFrame, controller) {
      let facePosition = await detectFace(videoFrame);
      let newFrame = blurBackground(videoFrame, facePosition);
      videoFrame.close();
      controller.enqueue(newFrame);
  }
});

processor.readable.pipeThrough(transformer).pipeTo(generator.writable);
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream;
const streamAfter = new MediaStream([generator]);
videoAfter.srcObject = streamAfter;
</pre>

The same example using a worker:
<pre class="example">
// main.js
const stream = await getUserMedia({video:true});
const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const generator = new MediaStreamTrackGenerator({kind: 'video'});
const worker = new Worker('worker.js');
worker.postMessage(
  {readable: processor.readable, writable: generator.writable},
  [processor.readable, generator.writable]);
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream;
const streamAfter = new MediaStream([generator]);
videoAfter.srcObject = streamAfter;

// worker.js
self.onmessage = async function(e) {
  const transformer = new TransformStream({
    async transform(videoFrame, controller) {
        const facePosition = await detectFace(videoFrame);
        const newFrame = blurBackground(videoFrame, facePosition);
        videoFrame.close();
        controller.enqueue(newFrame);
    }
  });

  e.data.readable.pipeThrough(transformer).pipeTo(e.data.writable);
}
</pre>

## Multi-source processing
Suppose there is a model for audio-visual speech separation, represented
by a class `AudioVisualModel` with a method `updateVideo(videoFrame)` that
updates the internal state of the model upon a new video frame, a
method `getSpeechData(audioData)` that returns a noise-canceled {{AudioData}}
given an input raw {{AudioData}}, and a `close()` method that
releases resources used internally by the model.

<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const audioProcessor = new MediaStreamTrackProcessor({track: audioTrack});
const videoProcessor = new MediaStreamTrackProcessor({track: videoTrack});
const audioGenerator = new MediaStreamTrackGenerator({kind: 'audio'});
const worker = new Worker('worker.js');
worker.postMessage({
    audioReadable: audioProcessor.readable,
    videoReadable: videoProcessor.readable,
    audioWritable: audioGenerator.writable
  }, [
    audioProcessor.readable,
    videoProcessor.readable,
    audioGenerator.writable
  ]);

// worker.js
self.onmessage = async function(e) {
  const model = new AudioVideoModel();
  const audioTransformer = new TransformStream({
    async transform(audioData, controller) {
        const speechData = model.getSpeechData(audioData);
        audioData.close();
        controller.enqueue(speechData);
    }
  });

  const audioPromise = e.data.audioReadable
      .pipeThrough(audioTransformer)
      .pipeTo(e.data.audioWritable);

  const videoReader = e.data.videoReadable.getReader();
  const videoPromise = new Promise(async resolve => {
    while (true) {
      const result = await videoReader.read();
      if (result.done) {
        break;
      } else {
        model.updateVideo(result.value);
        result.value.close();
      }
    }
    resolve();
  }

  await Promise.all([audioPromise, videoPromise]);
  model.close();
}
</pre>

An example that instead allows video effects that are influenced by
speech would be similar, except that the roles of audio and video would be
reversed.

## Custom sink
Suppose there are `sendAudioToNetwork(audioData)`  and
`sendVideoToNetwork(videoFrame)` functions that respectively send {{AudioData}}
and {{VideoFrame}} objects to a custom network sink, together with a
`setupNetworkSinks()` function to set up the sinks and a
`cleanupNetworkSinks()` function to release resources used by the sinks.

<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const audioProcessor = new MediaStreamTrackProcessor({track: audioTrack});
const videoProcessor = new MediaStreamTrackProcessor({track: videoTrack});
const worker = new Worker('worker.js');
worker.postMessage({
    audioReadable: audioProcessor.readable,
    videoReadable: videoProcessor.readable,
  }, [
    audioProcessor.readable,
    videoProcessor.readable,
  ]);

// worker.js
function writeToSink(readable, sinkFunction) {
  return new Promise(async resolve => {
    while (true) {
      const result = await readable.read();
      if (result.done) {
        break;
      } else {
        sinkFunction(result.value);
        result.value.close();
      }
    }
    resolve();
  });
}

self.onmessage = async function(e) {
  setupNetworkSinks();
  const audioReader = e.data.audioReadable.getReader();
  const videoReader = e.data.videoReadable.getReader();
  const audioPromise = writeToSink(audioReader, sendAudioToNetwork);
  const videoPromise = writeToSink(videoReader, sendVideoToNetwork);
  await Promise.all([audioPromise, videoPromise]);
  cleanupNetworkSinks();
}

</pre>
Example using transfer of the MediaStreamTrack:
<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const worker = new Worker('worker.js');
worker.postMessage({
    audioTrack: audioTrack,
    videoTrack: videoTrack,
  }, [
    audioTrack, videoTrack
  ]);

// worker.js
function writeToSink(readable, sinkFunction) {
  return new Promise(async resolve => {
    while (true) {
      const result = await readable.read();
      if (result.done) {
        break;
      } else {
        sinkFunction(result.value);
        result.value.close();
      }
    }
    resolve();
  });
}

self.onmessage = async function(e) {
  setupNetworkSinks();
  const audioProcessor = new MediaStreamTrackProcessor({track: e.data.audioTrack});
  const videoProcessor = new MediaStreamTrackProcessor({track: e.data.videoTrack});
  const audioReader = audioProcessor.readable.getReader();
  const videoReader = videoProcessor.readable.getReader();
  const audioPromise = writeToSink(audioReader, sendAudioToNetwork);
  const videoPromise = writeToSink(videoReader, sendVideoToNetwork);
  await Promise.all([audioPromise, videoPromise]);
  cleanupNetworkSinks();
}

</pre>


# Implementation advice # {#implementation-advice}

This section is informative.

## Use with multiple consumers

There are use cases where the programmer may desire that a single stream of frames
is consumed by multiple consumers.

Examples include the case where the result of a background blurring function should
be both displayed in a self-view and encoded using a {{VideoEncoder}}.

For cases where both consumers are consuming unprocessed frames, and synchronization
is not desired, instantianting multiple {{MediaStreamTrackProcessor}} objects is a robust solution.

For cases where both consumers intend to convert the result of a processing step into a
{{MediaStreamTrack}}
using a {{MediaStreamTrackGenerator}}, for example when feeding a processed stream
to both a &lt;video&gt& tag and an {{RTCPeerConnection}}, attaching the resulting {{MediaStreamTrack}}
to multiple sinks may be the most appropriate mechanism.

For cases where the downstream processing takes frames, not streams, the frames can
be cloned as needed and sent off to the downstream processing; "clone" is a cheap operation.

When the stream is the output of some processing, and both branches need a Stream object
to do further processing, one needs a function that produces two streams from one stream.

However, the standard tee() operation is problematic
in this context:

*   It defeats the backpressure mechanism that guards against excessive queueing
*   It creates multiple links to the same buffers, meaning that the question of which
    consumer gets to destroy() the buffer is a difficult one to address

Therefore, the use of tee() with Streams containing media should only be done when
fully understanding the implications. Instead, custom elements for splitting streams
more appropriate to the use case should be used.

*   If both branches require the ability to dispose of the frames, clone() the frame
    and enqueue distinct copies in both queues. This corresponds to the function
    ReadableStreamTee(stream, cloneForBranch2=true). Then choose one of the
    alternatives below.

*   If one branch requires all frames, and the other branch tolerates dropped frames,
    enqueue buffers in the all-frames-required stream and use the backpressure signal
    from that stream to stop reading from the source. If backpressure signal from the
    other stream indicates room, enqueue the same frame in that queue too.

*   If neither stream tolerates dropped frames, use the combined backpressure signal
    to stop reading from the source. In this case, frames will be processed in
    lockstep if the buffer sizes are both 1.

*   If it is OK for the incoming stream to be stalled only when the underlying
    buffer pool allocated to the process is exhausted, standard tee() may be used.

# Security and Privacy considerations # {#security-considerations}

This API defines a {{MediaStreamTrack}} source and a {{MediaStreamTrack}} sink.
The security and privacy of the source ({{MediaStreamTrackGenerator}}) relies
on the same-origin policy. That is, the data {{MediaStreamTrackGenerator}} can
make available in the form of a {{MediaStreamTrack}} must be visible to
the document before a {{VideoFrame}} or {{AudioData}} object can be constructed
and pushed into the {{MediaStreamTrackGenerator}}. Any attempt to create
{{VideoFrame}} or {{AudioData}} objects using cross-origin data will fail.
Therefore, {{MediaStreamTrackGenerator}} does not introduce any new
fingerprinting surface.

The {{MediaStreamTrack}} sink introduced by this API ({{MediaStreamTrackProcessor}})
exposes {{MediaStreamTrack}} the same data that is exposed by other
{{MediaStreamTrack}} sinks such as WebRTC peer connections, Web Audio
{{MediaStreamAudioSourceNode}} and media elements. The security and privacy
of {{MediaStreamTrackProcessor}} relies on the security and privacy of the
{{MediaStreamTrack}} sources of the tracks to which {{MediaStreamTrackProcessor}}
is connected. For example, camera, microphone and screen-capture tracks
rely on explicit use authorization via permission dialogs (see
[[MEDIACAPTURE-STREAMS]] and [[MEDIACAPTURE-SCREEN-SHARE]]),
while element capture and {{MediaStreamTrackGenerator}}
rely on the same-origin policy.
A potential issue with {{MediaStreamTrackProcessor}} is resource exhaustion.
For example, a site might hold on to too many open {{VideoFrame}} objects
and deplete a system-wide pool of GPU-memory-backed frames. UAs can
mitigate this risk by limiting the number of pool-backed frames a site can
hold. This can be achieved by reducing the maximum number of buffered frames
and by refusing to deliver more frames to {{MediaStreamTrackProcessor/readable}}
once the budget limit is reached. Accidental exhaustion is also mitigated by
automatic closing of {{VideoFrame}} and {{AudioData}} objects once they
are written to a {{MediaStreamTrackGenerator}}.