Skip to content

Conversation

@abhishek-singh591
Copy link
Contributor

Memory Optimization

Added periodic memory cleanup to FP16ClipTransform and SplitTensorsTransform to reduce memory usage during large tensor processing. Also avoids redundant external data loading when already present.

Time Optimized ONNX Transform via Class Merging and Thread Pooling

It merges the FP16 and Split ONNX transform classes into a single implementation to eliminate redundant tensor loading and iteration. Additionally, the transform logic has been refactored to use a thread pool, replacing the previous sequential loop to parallelize tensor operations.

Performance Benchmarks:-

Model Original Duration (s) Optimized Duration (s)
LLaMA 3.1 8B 88.35 58.55
LLaMA 3.1 70B 1029.82 727.37

Note: Thread count is set to os.cpu_count() * 4 to better handle I/O-bound workloads. Performance may vary depending on system hardware and threading capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants