Skip to content

Add tutorial for parallel decoding #778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 15, 2025

Conversation

NicolasHug
Copy link
Member

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 13, 2025
# Frame sampling strategy
# -----------------------
#
# For this tutorial, we'll sample frames at a target rate of 2 FPS from our long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a me-thing, but I was tripped up by the "2 FPS from our long video" part. That is, I initially thought that meant we were targeting the decoding itself to happen at the observable speed of 2 FPS. This might be because I know our benchmarks report FPS as their metric. Rather, we mean that we want to sample 2 FPS in the reference frame of the video's time. Phrasing that I think would have helped me understand quicker:

For this tutorial, we'll sample a frame every 2 seconds from our long video.

Also, just realized that in the existing text, "inference" is mispelled.

Copy link
Member Author

@NicolasHug NicolasHug Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, fps is overloaded and that confuses me too, sometimes. I'll add comments to clarify the context

#
# Process-based parallelism distributes work across multiple Python processes.

def decode_with_multiprocessing(indices: List[int], num_processes: int, video_path=long_video_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put each of these parameters on a separate line - on my system, the rendering for this line wraps.

"""Decode frames using multiple processes with joblib."""
chunks = split_indices(indices, num_chunks=num_processes)

results = Parallel(n_jobs=num_processes, backend="loky", verbose=0)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the "loky" backend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a multi-processing backend for joblib: https://github.com/joblib/loky

I added a comment. In general, I don't want to go too much over the details of joblib in this tutorial, because as I mentioned at the top, the concepts covered here are joblib-agnostic. The reader should just trust that the decode_with* functions are doing the right thing

# Thread-based parallelism uses multiple threads within a single process.
# TorchCodec releases the GIL, so this can be very effective.

def decode_with_multithreading(indices: List[int], num_threads: int, video_path=long_video_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - let's use line breaks for parameters.

"""Decode frames using multiple threads with joblib."""
chunks = split_indices(indices, num_chunks=num_threads)

results = Parallel(n_jobs=num_threads, prefer="threads", verbose=0)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "prefer" mean that in some situations it might end up using processes? Does it default to processes, since we didn't say this above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It basically mean "use threads, unless this gets overridden by something with higher priority, like a context manager". There is more about this in the docstring for "backend" https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent tutorial! I think this will be extremely helpful for our users!

times, result_ffmpeg = bench(decode_with_ffmpeg_parallelism, all_indices, num_threads=NUM_CPUS)
ffmpeg_time = report_stats(times, unit="s")
speedup = sequential_time / ffmpeg_time
print(f"Speedup vs sequential: {speedup:.2f}x with {NUM_CPUS} FFmpeg threads.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps for additional clarity, we could write out the comparison instead of using "vs":

print(f"Speedup compared to sequential: {speedup:.2f}x using {NUM_CPUS} FFmpeg threads.")

@NicolasHug NicolasHug merged commit b5995d6 into pytorch:main Jul 15, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants