-
Notifications
You must be signed in to change notification settings - Fork 48
Add tutorial for parallel decoding #778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Frame sampling strategy | ||
# ----------------------- | ||
# | ||
# For this tutorial, we'll sample frames at a target rate of 2 FPS from our long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a me-thing, but I was tripped up by the "2 FPS from our long video" part. That is, I initially thought that meant we were targeting the decoding itself to happen at the observable speed of 2 FPS. This might be because I know our benchmarks report FPS as their metric. Rather, we mean that we want to sample 2 FPS in the reference frame of the video's time. Phrasing that I think would have helped me understand quicker:
For this tutorial, we'll sample a frame every 2 seconds from our long video.
Also, just realized that in the existing text, "inference" is mispelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, fps is overloaded and that confuses me too, sometimes. I'll add comments to clarify the context
# | ||
# Process-based parallelism distributes work across multiple Python processes. | ||
|
||
def decode_with_multiprocessing(indices: List[int], num_processes: int, video_path=long_video_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put each of these parameters on a separate line - on my system, the rendering for this line wraps.
"""Decode frames using multiple processes with joblib.""" | ||
chunks = split_indices(indices, num_chunks=num_processes) | ||
|
||
results = Parallel(n_jobs=num_processes, backend="loky", verbose=0)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the "loky" backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a multi-processing backend for joblib: https://github.com/joblib/loky
I added a comment. In general, I don't want to go too much over the details of joblib
in this tutorial, because as I mentioned at the top, the concepts covered here are joblib-agnostic. The reader should just trust that the decode_with*
functions are doing the right thing
# Thread-based parallelism uses multiple threads within a single process. | ||
# TorchCodec releases the GIL, so this can be very effective. | ||
|
||
def decode_with_multithreading(indices: List[int], num_threads: int, video_path=long_video_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - let's use line breaks for parameters.
"""Decode frames using multiple threads with joblib.""" | ||
chunks = split_indices(indices, num_chunks=num_threads) | ||
|
||
results = Parallel(n_jobs=num_threads, prefer="threads", verbose=0)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "prefer" mean that in some situations it might end up using processes? Does it default to processes, since we didn't say this above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It basically mean "use threads, unless this gets overridden by something with higher priority, like a context manager". There is more about this in the docstring for "backend" https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent tutorial! I think this will be extremely helpful for our users!
times, result_ffmpeg = bench(decode_with_ffmpeg_parallelism, all_indices, num_threads=NUM_CPUS) | ||
ffmpeg_time = report_stats(times, unit="s") | ||
speedup = sequential_time / ffmpeg_time | ||
print(f"Speedup vs sequential: {speedup:.2f}x with {NUM_CPUS} FFmpeg threads.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Perhaps for additional clarity, we could write out the comparison instead of using "vs":
print(f"Speedup compared to sequential: {speedup:.2f}x using {NUM_CPUS} FFmpeg threads.")
No description provided.