Skip to content

GPU accelerated batched pre/post processing#172

Open
MarcelLieb wants to merge 53 commits intonikopueringer:mainfrom
MarcelLieb:main
Open

GPU accelerated batched pre/post processing#172
MarcelLieb wants to merge 53 commits intonikopueringer:mainfrom
MarcelLieb:main

Conversation

@MarcelLieb
Copy link
Contributor

@MarcelLieb MarcelLieb commented Mar 14, 2026

What does this change?

  • Reimplement pre/post processing using torchvision functions, which allows processing multiple images simultaneously on the GPU (Post-processing adapted from https://github.com/99oblivius/CorridorKey-Engine)
  • add toggle to select between opencv and pytorch post-processing pipeline
  • add tests for both torch and opencv methods
  • add tests for batched processing
  • add ability to opt out of composite previews generation
  • implement higher quality despeckle
  • total processing time reduction ~30% (potential of up to 2 fps on 5070 Ti if image loading gets optimized)
  • change the model to full float16 precision

Differences of main branch vs pre-processing on GPU

02_3c_00000 02_3c_00000 diff_02_3c_00000

Differences of main branch vs post-processing on GPU

02_3c_00000 02_3c_00000 diff_02_3c_00000

Differences main branch vs full GPU pipeline

02_3c_00000 02_3c_00000 diff_02_3c_00000

Differences full GPU fp32 vs fp16

02_3c_00000 02_3c_00000 diff_02_3c_00000

Improved despeckle

speckle_old speckle_new

Missing

To make full use of these optimizations clip_manager needs to be reworked to allow batch processing.
Also these changes need to be implemented into the mlx model to avoid a large divergence between implementations.

Checklist

  • uv run pytest passes
  • uv run ruff check passes
  • uv run ruff format --check passes

# Conflicts:
#	pyproject.toml
# Conflicts:
#	CorridorKeyModule/core/model_transformer.py
#	pyproject.toml
#	uv.lock
@MarcelLieb MarcelLieb marked this pull request as ready for review March 22, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant