GPU accelerated batched pre/post processing by MarcelLieb · Pull Request #172 · nikopueringer/CorridorKey

MarcelLieb · 2026-03-14T22:33:50Z

What does this change?

Reimplement pre/post processing using torchvision functions, which allows processing multiple images simultaneously on the GPU (Post-processing adapted from https://github.com/99oblivius/CorridorKey-Engine)
add toggle to select between opencv and pytorch post-processing pipeline
add tests for both torch and opencv methods
add tests for batched processing
add ability to opt out of composite previews generation
implement higher quality despeckle
total processing time reduction ~30% (potential of up to 2 fps on 5070 Ti if image loading gets optimized)
change the model to full float16 precision

Differences of main branch vs pre-processing on GPU

Differences of main branch vs post-processing on GPU

Differences main branch vs full GPU pipeline

Differences full GPU fp32 vs fp16

Improved despeckle

Missing

To make full use of these optimizations clip_manager needs to be reworked to allow batch processing.
Also these changes need to be implemented into the mlx model to avoid a large divergence between implementations.

Checklist

uv run pytest passes
uv run ruff check passes
uv run ruff format --check passes

# Conflicts: # pyproject.toml

This reverts commit efc3a4d.

This reverts commit ed578e8.

This reverts commit 6631537.

This reverts commit cd70255.

# Conflicts: # CorridorKeyModule/core/model_transformer.py # pyproject.toml # uv.lock

…ssing

# Conflicts: # tests/test_inference_engine.py

MarcelLieb added 30 commits March 9, 2026 15:19

torch.compile + precision optimizations

baa59a1

Merge remote-tracking branch 'origin/main'

2be6135

# Conflicts: # pyproject.toml

fix tests and formatting

769dc7b

add warmup run to remove compilation overhead from benchmark

cd70255

implement preprocessing on GPU using torchvision

6631537

add a batched frame processing function

ed578e8

use multiprocessing to speed up post-processing

efc3a4d

implement batched frame processing

18db1cb

Revert "use multiprocessing to speed up post-processing"

3852f0d

This reverts commit efc3a4d.

Revert "add a batched frame processing function"

855d435

This reverts commit ed578e8.

Revert "implement preprocessing on GPU using torchvision"

d1bc295

This reverts commit 6631537.

Revert "add warmup run to remove compilation overhead from benchmark"

01a84c4

This reverts commit cd70255.

Merge remote-tracking branch 'origin/main'

c9e182e

# Conflicts: # CorridorKeyModule/core/model_transformer.py # pyproject.toml # uv.lock

update uv.lock

4a148e2

fix lint

fa1b4cb

Merge branch 'root-main'

173be39

implement batched frame processing

e3b2b03

Merge remote-tracking branch 'fork/batch-processing' into batch-proce…

7824bf7

…ssing

move compilation to function call to improve flexibility

809f376

bound threads with batch size and fix lint

2ead927

add qualitative comparison helper script

8775cb9

fix tests

8bb35e8

initial GPU pipeline draft

12308e4

fix tests

75cafcd

Merge branch 'root-main'

8e205bf

Merge branch 'root-main'

a0fbc91

Merge branch 'root-main'

8eb92d1

optimize VRAM usage

1309163

Move to channels first format

2649c10

improve logic

64d9938

MarcelLieb added 23 commits March 17, 2026 02:07

optimize clean_matte

ed7340d

Add config options

dc9e109

Add changes to single frame method

9bacf50

clean up

b210153

use new methods

42cf2f8

Merge branch 'main' of https://github.com/MarcelLieb/CorridorKey

e7967f0

improved fast despeckle

1c56ae2

Merge remote-tracking branch 'fork/main'

e7dc97d

small fixes

15d3f66

fix compositing

bd26920

small fixes

39eb7fa

fix tests

6484006

match batch processing

8221f78

parameterize tests over backend

1f1970e

feat: add tests for batched frame processing

2e5b6ab

Merge branch 'root-main'

4ba6d62

# Conflicts: # tests/test_inference_engine.py

feat: cleanup + reorganization

cfa2012

feat: use full float16 precision

bfd0b31

fix: remove channels_last format

db17b63

feat: add CLI options

93f8df8

feat: remove redundant batch processing method

d5208ab

fix: parameter name

a90900b

feat: add safeguards for mlx

b437337

MarcelLieb marked this pull request as ready for review March 22, 2026 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU accelerated batched pre/post processing#172

GPU accelerated batched pre/post processing#172
MarcelLieb wants to merge 53 commits intonikopueringer:mainfrom
MarcelLieb:main

MarcelLieb commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MarcelLieb commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this change?

Differences of main branch vs pre-processing on GPU

Differences of main branch vs post-processing on GPU

Differences main branch vs full GPU pipeline

Differences full GPU fp32 vs fp16

Improved despeckle

Missing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MarcelLieb commented Mar 14, 2026 •

edited

Loading