forked from w3c/mediacapture-transform
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.bs
879 lines (785 loc) · 39.1 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
<pre class='metadata'>
Title: MediaStreamTrack Insertable Media Processing using Streams
Shortname: mediacapture-insertable-streams
Level: None
Status: UD
Group: webrtc
Repository: w3c/mediacapture-insertable-streams
URL: https://w3c.github.io/mediacapture-insertable-streams/
Editor: Harald Alvestrand, Google https://google.com, [email protected]
Editor: Guido Urdaneta, Google https://google.com, [email protected]
Abstract: This API defines an API surface for manipulating the bits on
Abstract: {{MediaStreamTrack}}s carrying raw data.
Abstract: NOT AN ADOPTED WORKING GROUP DOCUMENT.
Markup Shorthands: css no, markdown yes
</pre>
<pre class=anchors>
url: https://wicg.github.io/web-codecs/#videoframe; text: VideoFrame; type: interface; spec: WEBCODECS
url: https://wicg.github.io/web-codecs/#videoencoder; text: VideoEncoder; type: interface; spec: WEBCODECS
url: https://wicg.github.io/web-codecs/#audiodata; text: AudioData; type: interface; spec: WEBCODECS
url: https://www.w3.org/TR/mediacapture-streams/#mediastreamtrack; text: MediaStreamTrack; type: interface; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constrainulong; text: ConstrainULong; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constraindouble; text: ConstrainDouble; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-constraindomstring; text: ConstrainDOMString; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-videoresizemodeenum; text: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://w3c.github.io/mediacapture-main/#idl-def-VideoResizeModeEnum.user; text: none; for: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://w3c.github.io/mediacapture-main/#idl-def-VideoResizeModeEnum.right; text: crop-and-scale; for: VideoResizeModeEnum; type: enum; spec: MEDIACAPTURE-STREAMS
url: https://infra.spec.whatwg.org/#queues; text: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#queue-enqueue; text: enqueue; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#queue-dequeue; text: dequeue; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#list-is-empty; text: empty; for: Queue; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#booleans; text: Boolean; type: typedef; spec: INFRA
url: https://www.w3.org/TR/mediacapture-streams/#source-stopped; text: StopSource; type: typedef; spec: MEDIACAPTURE-STREAM
url: https://www.w3.org/TR/mediacapture-streams/#track-ended; text: TrackEnded; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://streams.spec.whatwg.org/#readable-stream-default-controller-close; text: ReadableStreamDefaultControllerClose; type: typedef; spec: STREAMS
url: https://streams.spec.whatwg.org/#readablestream-controller; text: ReadableStreamControllerSlot; type: typedef; spec: STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#ends-nostop; text: EndTrack; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://www.w3.org/TR/mediacapture-streams/#dom-overconstrainederror; text: OverconstrainedError; type: typedef; spec: MEDIACAPTURE-STREAMS
url: https://infra.spec.whatwg.org/#list-empty; text: Empty; for: List; type: typedef; spec: INFRA
url: https://infra.spec.whatwg.org/#list-remove; text: remove; for: List; type: typedef; spec: INFRA
</pre>
<pre class=biblio>
{
"WEBCODECS": {
"href":
"https://wicg.github.io/web-codecs/",
"title": "WebCodecs"
},
"MEDIACAPTURE-SCREEN-SHARE": {
"href": "https://w3c.github.io/mediacapture-screen-share/",
"title": "Screen Capture"
},
"MEDIACAPTURE-STREAMS": {
"href": "https://www.w3.org/TR/mediacapture-streams/",
"title": "Media Capture and Streams"
},
"INFRA": {
"href": "https://https://infra.spec.whatwg.org",
"title": "Infra"
},
"STREAMS": {
"href": "https://streams.spec.whatwg.org",
"title": "Streams"
},
"WEBAUDIO": {
"href": "https://www.w3.org/TR/webaudio/",
"title": "Web Audio API"
},
"WEBTRANSPORT": {
"href": "https://www.w3.org/TR/webtransport/",
"title": "WebTransport"
}
}
</pre>
<pre class=link-defaults>
spec:streams; type:interface; text:WritableStream
</pre>
# Introduction # {#introduction}
The [[WEBRTC-NV-USE-CASES]] document describes several functions that
can only be achieved by access to media (requirements N20-N22),
including, but not limited to:
* Funny Hats
* Machine Learning
* Virtual Reality Gaming
These use cases further require that processing can be done in worker
threads (requirement N23-N24).
This specification gives an interface based on [[WEBCODECS]] and [[STREAMS]] to
provide access to such functionality.
This specification provides access to raw media,
which is the output of a media source such as a camera, microphone, screen capture,
or the decoder part of a codec and the input to the
decoder part of a codec. The processed media can be consumed by any destination
that can take a MediaStreamTrack, including HTML <video> and <audio> tags,
RTCPeerConnection, canvas or MediaRecorder.
This specification explicitly aims to support the following use cases:
- *Video processing*: This is the "Funny Hats" use case, where the input is a single video track and the output is a transformed video track.
- *Audio processing*: This is the equivalent of the video processing use case, but for audio tracks. This use case overlaps partially with the {{AudioWorklet}} interface, but the model provided by this specification differs in significant ways:
- Pull-based programming model, as opposed to {{AudioWorklet}}'s clock-based model. This means that processing of each single block of audio data does not have a set time budget.
- Offers direct access to the data and metadata from the original {{MediaStreamTrack}}. In particular, timestamps come directly from the track as opposed to an {{AudioContext}}.
- Easier integration with video processing by providing the same API and programming model and allowing both to run on the same scope.
- Does not run on a real-time thread. This means that the model is not suitable for applications with strong low-latency requirements.
These differences make the model provided by this specification more
suitable than {{AudioWorklet}} for processing that requires more tolerance
to transient CPU spikes, better integration with video
{{MediaStreamTrack}}s, access to track metadata (e.g., timestamps), but
not strong low-latency requirements such as local audio rendering.
An example of this would be <a href="https://arxiv.org/abs/1804.03619">
audio-visual speech separation</a>, which can be used to combine the video
and audio tracks from a speaker on the sender side of a video call and
remove noise not coming from the speaker (i.e., the "Noisy cafeteria" case).
Other examples that do not require integration with video but can benefit
from the model include echo detection and other forms of ML-based noise
cancellation.
- *Multi-source processing*: In this use case, two or more tracks are combined into one. For example, a presentation containing a live weather map and a camera track with the speaker can be combined to produce a weather report application. Audio-visual speech separation, referenced above, is another case of multi-source processing.
- *Custom audio or video sink*: In this use case, the purpose is not producing a processed {{MediaStreamTrack}}, but to consume the media in a different way. For example, an application could use [[WEBCODECS]] and [[WEBTRANSPORT]] to create an {{RTCPeerConnection}}-like sink, but using different codec configuration and networking protocols.
# Specification # {#specification}
This specification shows the IDL extensions for [[MEDIACAPTURE-STREAMS]].
It defines some new objects that inherit the {{MediaStreamTrack}} interface, and
can be constructed from a {{MediaStreamTrack}}.
The API consists of two elements. One is a track sink that is
capable of exposing the unencoded media frames from the track to a ReadableStream.
The other one is the inverse of that: it provides a track source that takes
media frames as input.
<!-- ## Extension operation ## {#operation} -->
## MediaStreamTrackProcessor ## {#track-processor}
A {{MediaStreamTrackProcessor}} allows the creation of a
{{ReadableStream}} that can expose the media flowing through
a given {{MediaStreamTrack}}. If the {{MediaStreamTrack}} is a video track,
the chunks exposed by the stream will be {{VideoFrame}} objects;
if the track is an audio track, the chunks will be {{AudioData}} objects.
This makes {{MediaStreamTrackProcessor}} effectively a sink in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a>.
A {{MediaStreamTrackProcessor}} internally contains a circular queue
that allows buffering incoming media frames delivered by the track it
is connected to. This buffering allows the {{MediaStreamTrackProcessor}}
to temporarily hold frames waiting to be read from its associated {{ReadableStream}}.
The application can influence the maximum size of the queue via a parameter
provided in the {{MediaStreamTrackProcessor}} constructor. However, the
maximum size of the queue is decided by the UA and can change dynamically,
but it will not exceed the size requested by the application.
If the application does not provide a maximum size parameter, the UA is free
to decide the maximum size of the queue.
When a new frame arrives to the
{{MediaStreamTrackProcessor}}, if the queue has reached its maximum size,
the oldest frame will be removed from the queue, and the new frame will be
added to the queue. This means that for the particular case of a queue
with a maximum size of 1, if there is a queued frame, it will aways be
the most recent one.
The UA is also free to remove any frames from the queue at any time. The UA
may remove frames in order to save resources or to improve performance in
specific situations. In all cases, frames that are not dropped
must be made available to the {{ReadableStream}} in the order in which
they arrive to the {{MediaStreamTrackProcessor}}.
A {{MediaStreamTrackProcessor}} makes frames available to its
associated {{ReadableStream}} only when a read request has been issued on
the stream. The idea is to avoid the stream's internal buffering, which
does not give the UA enough flexibility to choose the buffering policy.
### Interface definition ### {#track-processor-interface}
<pre class="idl">
[Exposed=Window,DedicatedWorker]
interface MediaStreamTrackProcessor {
constructor(MediaStreamTrackProcessorInit init);
attribute ReadableStream readable;
};
dictionary MediaStreamTrackProcessorInit {
required MediaStreamTrack track;
[EnforceRange] unsigned short maxBufferSize;
};
</pre>
### Internal slots ### {#internal-slots-processor}
<dl>
<dt><dfn for=MediaStreamTrackProcessor>`[[track]]`</dfn></dt>
<dd>Track whose raw data is to be exposed by the {{MediaStreamTrackProcessor}}.</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[maxBufferSize]]`</dfn></dt>
<dd>The maximum number of media frames to be buffered by the {{MediaStreamTrackProcessor}}
as specified by the application. It may have no value if the application does
not provide it. Its minimum valid value is 1.</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[queue]]`</dfn></dt>
<dd>A {{Queue|queue}} used to buffer media frames not yet read by the application</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[numPendingReads]]`</dfn></dt>
<dd>An integer whose value represents the number of read requests issued by the
application that have not yet been handled.
</dd>
<dt><dfn for=MediaStreamTrackProcessor>`[[isClosed]]`</dfn></dt>
<dd>An boolean whose value indicates if the {{MediaStreamTrackProcessor}} is closed.
</dd>
</dl>
### Constructor ### {#constructor-processor}
<dfn constructor for=MediaStreamTrackProcessor title="MediaStreamTrackProcessor(init)">
MediaStreamTrackProcessor(|init|)
</dfn>
1. If |init|.{{MediaStreamTrackProcessorInit/track}} is not a valid {{MediaStreamTrack}},
throw a {{TypeError}}.
2. Let |processor| be a new {{MediaStreamTrackProcessor}} object.
3. Assign |init|.{{MediaStreamTrackProcessorInit/track}} to |processor|.`[[track]]`.
4. If |init|.{{MediaStreamTrackProcessorInit/maxBufferSize}} has a integer value greater than or equal to 1, assign it to |processor|.`[[maxBufferSize]]`.
6. Set the `[[queue]]` internal slot of |processor| to an empty {{Queue}}.
7. Set |processor|.`[[numPendingReads]]` to 0.
8. Set |processor|.`[[isClosed]]` to false.
9. Return |processor|.
### Attributes ### {#attributes-processor}
<dl>
<dt><dfn for=MediaStreamTrackProcessor>readable</dfn></dt>
<dd>Allows reading the frames delivered by the {{MediaStreamTrack}} stored
in the `[[track]]` internal slot. This attribute is created the first time it is invoked
according to the following steps:
1. Initialize [=this=].{{MediaStreamTrackProcessor/readable}} to be a new {{ReadableStream}}.
2. <a dfn for="ReadableStream">Set up</a> [=this=].{{MediaStreamTrackProcessor/readable}} with its [=ReadableStream/set up/pullAlgorithm=] set to [=processorPull=] with [=this=] as parameter, [=ReadableStream/set up/cancelAlgorithm=] set to [=processorCancel=] with [=this=] as parameter, and [=ReadableStream/set up/highWatermark=] set to 0.
The <dfn>processorPull</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. Increment the value of the |processor|.`[[numPendingReads]]` by 1.
2. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.
3. Return [=a promise resolved with=] undefined.
The <dfn>maybeReadFrame</dfn> algorithm is given a |processor| as input. It is defined by the following steps:
1. If |processor|.`[[queue]]` is {{Queue/empty}}, abort these steps.
2. If |processor|.`[[numPendingReads]]` equals zero, abort these steps.
3. {{Queue/dequeue}} a frame from |processor|.`[[queue]]` and [=ReadableStream/Enqueue=] it in |processor|.{{MediaStreamTrackProcessor/readable}}.
4. Decrement |processor|.`[[numPendingReads]]` by 1.
5. Go to step 1.
The <dfn>processorCancel</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. Run the [=processorClose=] algorithm with |processor| as parameter.
3. Return [=a promise resolved with=] undefined.
The <dfn>processorClose</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[isClosed]]` is true, abort these steps.
2. Disconnect |processor| from |processor|.`[[track]]`. The mechanism to do this is UA specific and the result is that |processor| is no longer a sink of |processor|.`[[track]]`.
3. {{ReadableStreamDefaultControllerClose|Close}} |processor|.{{MediaStreamTrackProcessor/readable}}.{{ReadableStreamControllerSlot|[[controller]]}}.
4. {{List/Empty}} |processor|.`[[queue]]`.
5. Set |processor|.`[[isClosed]]` to true.
</dd>
</dl>
### Handling interaction with the track ### {#processor-handling-interaction-with-track}
When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor| delivers a
frame to |processor|, the UA MUST execute the [=handleNewFrame=] algorithm
with |processor| as parameter.
The <dfn>handleNewFrame</dfn> algorithm is given a |processor| as input.
It is defined by running the following steps:
1. If |processor|.`[[maxBufferSize]]` has a value and |processor|.`[[queue]]` has |processor|.`[[maxBufferSize]]` elements, {{Queue/dequeue}} an item from |processor|.`[[queue]]`.
2. {{Queue/enqueue}} the new frame in |processor|.`[[queue]]`.
3. [=Queue a task=] to run the [=maybeReadFrame=] algorithm with |processor| as parameter.
At any time, the UA MAY {{List/remove}} any frame from |processor|.`[[queue]]`.
The UA may decide to remove frames from |processor|.`[[queue]]`, for example,
to prevent resource exhaustion or to improve performance in certain situations.
</dd>
<p class="note">
The application may detect that frames have been dropped by noticing that there
is a gap in the timestamps of the frames.
</p>
</dl>
When the `[[track]]` of a {{MediaStreamTrackProcessor}} |processor|
{{TrackEnded|ends}}, the [=processorClose=] algorithm must be
executed with |processor| as parameter.
## MediaStreamTrackGenerator ## {#generator}
A {{MediaStreamTrackGenerator}} allows the creation of a {{WritableStream}}
that acts as a {{MediaStreamTrack}} source in the
<a href="https://www.w3.org/TR/mediacapture-streams/#the-model-sources-sinks-constraints-and-settings">
MediaStream model</a>. Since the model does not expose sources directly
but through the tracks connected to it, a {{MediaStreamTrackGenerator}}
is also a track connected to its {{WritableStream}} source. Further tracks
connected to the same {{WritableStream}} can be created using the
{{MediaStreamTrack/clone}} method. The {{WritableStream}} source is
exposed as the {{MediaStreamTrackGenerator/writable}} field of
{{MediaStreamTrackGenerator}}.
Similarly to {{MediaStreamTrackProcessor}}, the {{WritableStream}} of
an audio {{MediaStreamTrackGenerator}} accepts {{AudioData}} objects,
and a video {{MediaStreamTrackGenerator}} accepts {{VideoFrame}} objects.
When a {{VideoFrame}} or {{AudioData}} object is written to
{{MediaStreamTrackGenerator/writable}},
the frame's `close()` method is automatically invoked, so that its internal
resources are no longer accessible from JavaScript.
### Interface definition ### {#generator-interface}
<pre class="idl">
[Exposed=Window,DedicatedWorker]
interface MediaStreamTrackGenerator : MediaStreamTrack {
constructor(MediaStreamTrackGeneratorInit init);
attribute WritableStream writable; // VideoFrame or AudioFrame
};
dictionary MediaStreamTrackGeneratorInit {
required DOMString kind;
};
</pre>
### Constructor ### {#generator-constructor}
<dfn constructor for=MediaStreamTrackGenerator title="MediaStreamTrackGenerator(init)">
MediaStreamTrackGenerator(init)
</dfn>
1. If |init|.{{MediaStreamTrackGeneratorInit/kind}} is not `"audio"` or `"video"`,
throw a {{TypeError}}.
2. Let |g| be a new {{MediaStreamTrackGenerator}} object.
3. Initialize the {{MediaStreamTrack/kind}} field of |g| (inherited from {{MediaStreamTrack}})
with |init|.{{MediaStreamTrackGeneratorInit/kind}}.
4. Return |g|.
### Attributes ### {#generator-attributes}
<dl>
<dt><dfn attribute for=MediaStreamTrackGenerator>writable</dfn></dt>
<dd>Allows writing media frames to the {{MediaStreamTrackGenerator}}, which is
itself a {{MediaStreamTrack}}. When this attribute is accessed for the first time,
it MUST be initialized with the following steps:
1. Initialize [=this=].{{MediaStreamTrackGenerator/writable}} to be a new {{WritableStream}}.
2. <a dfn for="WritableStream">Set up</a> [=this=].{{MediaStreamTrackGenerator/writable}}, with its [=WritableStream/set up/writeAlgorithm=] set to [=writeFrame=] with |this| as parameter, with [=WritableStream/set up/closeAlgorithm=] set to [=closeWritable=] with |this| as parameter and [=WritableStream/set up/abortAlgorithm=] set to [=closeWritable=] with |this| as parameter.
The <dfn>writeFrame</dfn> algorithm is given a |generator| and a |frame| as input. It is defined by running the following steps:
1. If |generator|.{{MediaStreamTrack/kind}} equals `video` and |frame| is not a {{VideoFrame}} object, return [=a promise rejected with=] a {{TypeError}}.
2. If |generator|.{{MediaStreamTrack/kind}} equals `audio` and |frame| is not an {{AudioData}} object, return [=a promise rejected with=] a {{TypeError}}.
3. Send the media data backing |frame| to all live tracks connected to |generator|, possibly including |generator| itself.
4. Invoke the `close` method of |frame|.
5. Return [=a promise resolved with=] undefined.
<p class="note">
When the media data is sent to a track, the UA may apply processing
(e.g., cropping and downscaling) to ensure that the media data sent
to the track satisfies the track's constraints. Each track may receive a
different version of the media data depending on its constraints.
</p>
The <dfn>closeWritable</dfn> algorithm is given a |generator| as input.
It is defined by running the following steps.
1. For each track `t` connected to |generator|, {{EndTrack|end}} `t`.
2. Return [=a promise resolved with=] undefined.
</dd>
</dl>
### Specialization of MediaStreamTrack behavior ### {#generator-as-track}
A {{MediaStreamTrackGenerator}} is a {{MediaStreamTrack}}. This section adds
clarifications on how a {{MediaStreamTrackGenerator}} behaves as a
{{MediaStreamTrack}}.
#### clone #### {#generator-clone}
The {{MediaStreamTrack/clone}} method on a {{MediaStreamTrackGenerator}}
returns a new {{MediaStreamTrack}} object whose source is the
same as the one for the {{MediaStreamTrackGenerator}} being cloned.
This source is the {{MediaStreamTrackGenerator/writable}} field of
the {{MediaStreamTrackGenerator}}.
#### stop #### {#generator-stop}
The {{MediaStreamTrack/stop}} method on a {{MediaStreamTrackGenerator}} stops
the track. When the last track connected to
the {{MediaStreamTrackGenerator/writable}} of a {{MediaStreamTrackGenerator}}
ends, its {{MediaStreamTrackGenerator/writable}} is [=WritableStream/closing|closed=].
#### Constrainable properties #### {#generator-constrainable-properties}
The following constrainable properties are defined for video
{{MediaStreamTrackGenerator}}s and any {{MediaStreamTrack}}s sourced from
a {{MediaStreamTrackGenerator}}:
<table>
<thead>
<tr>
<th>
Property Name
</th>
<th>
Values
</th>
<th>
Notes
</th>
</tr>
</thead>
<tbody>
<tr id="def-constraint-width">
<td data-tests="">
<dfn>width</dfn>
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the width, in pixels, of the latest
frame received by the track.
As a capability, `max` MUST reflect the
largest width a {{VideoFrame}} may have, and `min`
MUST reflect the smallest width a {{VideoFrame}} may have.
</td>
</tr>
<tr id="def-constraint-height">
<td data-tests="">
<dfn>height</dfn>
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the height, in pixels, of the latest
frame received by the track.
As a capability, `max` MUST reflect the largest height
a {{VideoFrame}} may have, and `min` MUST reflect
the smallest height a {{VideoFrame}} may have.
</td>
</tr>
<tr id="def-constraint-frameRate">
<td data-tests="">
<dfn>frameRate</dfn>
</td>
<td>
{{ConstrainDouble}}
</td>
<td>
As a setting, this is an estimate of the frame rate based on frames
recently received by the track.
As a capability `min` MUST be zero and
`max` MUST be the maximum frame rate supported by the system.
</td>
</tr>
<tr id="def-constraint-aspect">
<td data-tests="">
<dfn>aspectRatio</dfn>
</td>
<td>
{{ConstrainDouble}}
</td>
<td>
As a setting, this is the aspect ratio of the latest frame
delivered by the track;
this is the width in pixels divided by height in pixels as a
double rounded to the tenth decimal place. As a capability,
`min` MUST be the
smallest aspect ratio supported by a {{VideoFrame}}, and `max` MUST be
the largest aspect ratio supported by a {{VideoFrame}}.
</td>
</tr>
<tr id="def-constraint-resizeMode">
<td data-tests="">
<dfn>resizeMode</dfn>
</td>
<td>
{{ConstrainDOMString}}
</td>
<td>
As a setting, this string should be one of the members of
{{VideoResizeModeEnum}}. The value "{{VideoResizeModeEnum/none}}"
means that the frames output by the MediaStreamTrack are unmodified
versions of the frames written to the
{{MediaStreamTrackGenerator/writable}} backing
the track, regardless of any constraints.
The value "{{VideoResizeModeEnum/crop-and-scale}}" means
that the frames output by the MediaStreamTrack may be cropped and/or
downscaled versions
of the source frames, based on the values of the width, height and
aspectRatio constraints of the track.
As a capability, the values "{{VideoResizeModeEnum/none}}" and
"{{VideoResizeModeEnum/crop-and-scale}}" both MUST be present.
</td>
</tr>
</tbody>
</table>
The {{MediaStreamTrack/applyConstraints}} method applied to a video {{MediaStreamTrack}}
sourced from a {{MediaStreamTrackGenerator}} supports the properties defined above.
It can be used, for example, to resize frames or adjust the frame rate of the track.
Note that these constraints have no effect on the {{VideoFrame}} objects
written to the {{MediaStreamTrackGenerator/writable}} of a {{MediaStreamTrackGenerator}},
just on the output of the track on which the constraints have been applied.
Note also that, since a {{MediaStreamTrackGenerator}} can in principle produce
media data with any setting for the supported constrainable properties,
an {{MediaStreamTrack/applyConstraints}} call on a track
backed by a {{MediaStreamTrackGenerator}} will generally not fail with
{{OverconstrainedError}} unless the given constraints
are outside the system-supported range, as reported by
{{MediaStreamTrack/getCapabilities}}.
The following constrainable properties are defined for audio {{MediaStreamTrack}}s
sourced from a {{MediaStreamTrackGenerator}}, but in an informational capacity
only available via {{MediaStreamTrack/getSettings}}. It is not possible to
reconfigure an audio track sourced by a {{MediaStreamTrackGenerator}} using
{{MediaStreamTrack/applyConstraints}} in the same way that it is possble to,
for example, resize the frames of a video track.
{{MediaStreamTrack/getCapabilities}} MUST return an empty object.
<table>
<thead>
<tr>
<th>
Property Name
</th>
<th>
Values
</th>
<th>
Notes
</th>
</tr>
</thead>
<tbody>
<tr id="def-constraint-sampleRate">
<td data-tests="">
<dfn>sampleRate</dfn>
</td>
<td>
{{ConstrainDouble}}
</td>
<td>
As a setting, this is the sample rate, in samples per second, of the
latest {{AudioData}} delivered by the track.
</td>
</tr>
<tr id="def-constraint-channelCount">
<td data-tests="">
<dfn>channelCount</dfn>
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the number of independent audio channels of the
latest {{AudioData}} delivered by the track.
</td>
</tr>
<tr id="def-constraint-sampleSize">
<td data-tests="">
<dfn>sampleSize</dfn>
</td>
<td>
{{ConstrainULong}}
</td>
<td>
As a setting, this is the linear sample size of the latest {{AudioData}}
delivered by the track.
</td>
</tr>
</tbody>
</table>
#### Events and attributes #### {#generator-events-attributes}
Events and attributes work the same as for any {{MediaStreamTrack}}.
It is relevant to note that if the {{MediaStreamTrackGenerator/writable}}
stream of a {{MediaStreamTrackGenerator}} is closed, all the live
tracks connected to it, possibly including the {{MediaStreamTrackGenerator}}
itself, are ended and the `ended` event is fired on them.
# Examples # {#examples}
## Video Processing
Consider a face recognition function `detectFace(videoFrame)` that returns a face position
(in some format), and a manipulation function `blurBackground(videoFrame, facePosition)` that
returns a new VideoFrame similar to the given `videoFrame`, but with the
non-face parts blurred. The example also shows the video before and after
effects on video elements.
<pre class="example">
const stream = await getUserMedia({video:true});
const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const generator = new MediaStreamTrackGenerator({kind: 'video'});
const transformer = new TransformStream({
async transform(videoFrame, controller) {
let facePosition = await detectFace(videoFrame);
let newFrame = blurBackground(videoFrame, facePosition);
videoFrame.close();
controller.enqueue(newFrame);
}
});
processor.readable.pipeThrough(transformer).pipeTo(generator.writable);
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream;
const streamAfter = new MediaStream([generator]);
videoAfter.srcObject = streamAfter;
</pre>
The same example using a worker:
<pre class="example">
// main.js
const stream = await getUserMedia({video:true});
const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor({track: videoTrack});
const generator = new MediaStreamTrackGenerator({kind: 'video'});
const worker = new Worker('worker.js');
worker.postMessage(
{readable: processor.readable, writable: generator.writable},
[processor.readable, generator.writable]);
const videoBefore = document.getElementById('video-before');
const videoAfter = document.getElementById('video-after');
videoBefore.srcObject = stream;
const streamAfter = new MediaStream([generator]);
videoAfter.srcObject = streamAfter;
// worker.js
self.onmessage = async function(e) {
const transformer = new TransformStream({
async transform(videoFrame, controller) {
const facePosition = await detectFace(videoFrame);
const newFrame = blurBackground(videoFrame, facePosition);
videoFrame.close();
controller.enqueue(newFrame);
}
});
e.data.readable.pipeThrough(transformer).pipeTo(e.data.writable);
}
</pre>
## Multi-source processing
Suppose there is a model for audio-visual speech separation, represented
by a class `AudioVisualModel` with a method `updateVideo(videoFrame)` that
updates the internal state of the model upon a new video frame, a
method `getSpeechData(audioData)` that returns a noise-canceled {{AudioData}}
given an input raw {{AudioData}}, and a `close()` method that
releases resources used internally by the model.
<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const audioProcessor = new MediaStreamTrackProcessor({track: audioTrack});
const videoProcessor = new MediaStreamTrackProcessor({track: videoTrack});
const audioGenerator = new MediaStreamTrackGenerator({kind: 'audio'});
const worker = new Worker('worker.js');
worker.postMessage({
audioReadable: audioProcessor.readable,
videoReadable: videoProcessor.readable,
audioWritable: audioGenerator.writable
}, [
audioProcessor.readable,
videoProcessor.readable,
audioGenerator.writable
]);
// worker.js
self.onmessage = async function(e) {
const model = new AudioVideoModel();
const audioTransformer = new TransformStream({
async transform(audioData, controller) {
const speechData = model.getSpeechData(audioData);
audioData.close();
controller.enqueue(speechData);
}
});
const audioPromise = e.data.audioReadable
.pipeThrough(audioTransformer)
.pipeTo(e.data.audioWritable);
const videoReader = e.data.videoReadable.getReader();
const videoPromise = new Promise(async resolve => {
while (true) {
const result = await videoReader.read();
if (result.done) {
break;
} else {
model.updateVideo(result.value);
result.value.close();
}
}
resolve();
}
await Promise.all([audioPromise, videoPromise]);
model.close();
}
</pre>
An example that instead allows video effects that are influenced by
speech would be similar, except that the roles of audio and video would be
reversed.
## Custom sink
Suppose there are `sendAudioToNetwork(audioData)` and
`sendVideoToNetwork(videoFrame)` functions that respectively send {{AudioData}}
and {{VideoFrame}} objects to a custom network sink, together with a
`setupNetworkSinks()` function to set up the sinks and a
`cleanupNetworkSinks()` function to release resources used by the sinks.
<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const audioProcessor = new MediaStreamTrackProcessor({track: audioTrack});
const videoProcessor = new MediaStreamTrackProcessor({track: videoTrack});
const worker = new Worker('worker.js');
worker.postMessage({
audioReadable: audioProcessor.readable,
videoReadable: videoProcessor.readable,
}, [
audioProcessor.readable,
videoProcessor.readable,
]);
// worker.js
function writeToSink(readable, sinkFunction) {
return new Promise(async resolve => {
while (true) {
const result = await readable.read();
if (result.done) {
break;
} else {
sinkFunction(result.value);
result.value.close();
}
}
resolve();
});
}
self.onmessage = async function(e) {
setupNetworkSinks();
const audioReader = e.data.audioReadable.getReader();
const videoReader = e.data.videoReadable.getReader();
const audioPromise = writeToSink(audioReader, sendAudioToNetwork);
const videoPromise = writeToSink(videoReader, sendVideoToNetwork);
await Promise.all([audioPromise, videoPromise]);
cleanupNetworkSinks();
}
</pre>
Example using transfer of the MediaStreamTrack:
<pre class="example">
// main.js
const stream = await getUserMedia({audio:true, video:true});
const audioTrack = stream.getAudioTracks()[0];
const videoTrack = stream.getVideoTracks()[0];
const worker = new Worker('worker.js');
worker.postMessage({
audioTrack: audioTrack,
videoTrack: videoTrack,
}, [
audioTrack, videoTrack
]);
// worker.js
function writeToSink(readable, sinkFunction) {
return new Promise(async resolve => {
while (true) {
const result = await readable.read();
if (result.done) {
break;
} else {
sinkFunction(result.value);
result.value.close();
}
}
resolve();
});
}
self.onmessage = async function(e) {
setupNetworkSinks();
const audioProcessor = new MediaStreamTrackProcessor({track: e.data.audioTrack});
const videoProcessor = new MediaStreamTrackProcessor({track: e.data.videoTrack});
const audioReader = audioProcessor.readable.getReader();
const videoReader = videoProcessor.readable.getReader();
const audioPromise = writeToSink(audioReader, sendAudioToNetwork);
const videoPromise = writeToSink(videoReader, sendVideoToNetwork);
await Promise.all([audioPromise, videoPromise]);
cleanupNetworkSinks();
}
</pre>
# Implementation advice # {#implementation-advice}
This section is informative.
## Use with multiple consumers
There are use cases where the programmer may desire that a single stream of frames
is consumed by multiple consumers.
Examples include the case where the result of a background blurring function should
be both displayed in a self-view and encoded using a {{VideoEncoder}}.
For cases where both consumers are consuming unprocessed frames, and synchronization
is not desired, instantianting multiple {{MediaStreamTrackProcessor}} objects is a robust solution.
For cases where both consumers intend to convert the result of a processing step into a
{{MediaStreamTrack}}
using a {{MediaStreamTrackGenerator}}, for example when feeding a processed stream
to both a <video>& tag and an {{RTCPeerConnection}}, attaching the resulting {{MediaStreamTrack}}
to multiple sinks may be the most appropriate mechanism.
For cases where the downstream processing takes frames, not streams, the frames can
be cloned as needed and sent off to the downstream processing; "clone" is a cheap operation.
When the stream is the output of some processing, and both branches need a Stream object
to do further processing, one needs a function that produces two streams from one stream.
However, the standard tee() operation is problematic
in this context:
* It defeats the backpressure mechanism that guards against excessive queueing
* It creates multiple links to the same buffers, meaning that the question of which
consumer gets to destroy() the buffer is a difficult one to address
Therefore, the use of tee() with Streams containing media should only be done when
fully understanding the implications. Instead, custom elements for splitting streams
more appropriate to the use case should be used.
* If both branches require the ability to dispose of the frames, clone() the frame
and enqueue distinct copies in both queues. This corresponds to the function
ReadableStreamTee(stream, cloneForBranch2=true). Then choose one of the
alternatives below.
* If one branch requires all frames, and the other branch tolerates dropped frames,
enqueue buffers in the all-frames-required stream and use the backpressure signal
from that stream to stop reading from the source. If backpressure signal from the
other stream indicates room, enqueue the same frame in that queue too.
* If neither stream tolerates dropped frames, use the combined backpressure signal
to stop reading from the source. In this case, frames will be processed in
lockstep if the buffer sizes are both 1.
* If it is OK for the incoming stream to be stalled only when the underlying
buffer pool allocated to the process is exhausted, standard tee() may be used.
# Security and Privacy considerations # {#security-considerations}
This API defines a {{MediaStreamTrack}} source and a {{MediaStreamTrack}} sink.
The security and privacy of the source ({{MediaStreamTrackGenerator}}) relies
on the same-origin policy. That is, the data {{MediaStreamTrackGenerator}} can
make available in the form of a {{MediaStreamTrack}} must be visible to
the document before a {{VideoFrame}} or {{AudioData}} object can be constructed
and pushed into the {{MediaStreamTrackGenerator}}. Any attempt to create
{{VideoFrame}} or {{AudioData}} objects using cross-origin data will fail.
Therefore, {{MediaStreamTrackGenerator}} does not introduce any new
fingerprinting surface.
The {{MediaStreamTrack}} sink introduced by this API ({{MediaStreamTrackProcessor}})
exposes {{MediaStreamTrack}} the same data that is exposed by other
{{MediaStreamTrack}} sinks such as WebRTC peer connections, Web Audio
{{MediaStreamAudioSourceNode}} and media elements. The security and privacy
of {{MediaStreamTrackProcessor}} relies on the security and privacy of the
{{MediaStreamTrack}} sources of the tracks to which {{MediaStreamTrackProcessor}}
is connected. For example, camera, microphone and screen-capture tracks
rely on explicit use authorization via permission dialogs (see
[[MEDIACAPTURE-STREAMS]] and [[MEDIACAPTURE-SCREEN-SHARE]]),
while element capture and {{MediaStreamTrackGenerator}}
rely on the same-origin policy.
A potential issue with {{MediaStreamTrackProcessor}} is resource exhaustion.
For example, a site might hold on to too many open {{VideoFrame}} objects
and deplete a system-wide pool of GPU-memory-backed frames. UAs can
mitigate this risk by limiting the number of pool-backed frames a site can
hold. This can be achieved by reducing the maximum number of buffered frames
and by refusing to deliver more frames to {{MediaStreamTrackProcessor/readable}}
once the budget limit is reached. Accidental exhaustion is also mitigated by
automatic closing of {{VideoFrame}} and {{AudioData}} objects once they
are written to a {{MediaStreamTrackGenerator}}.