Skip to content

Conversation

rishi-yadav
Copy link

Data Type Combination Tests Status Avg Time (ms) Notes
BF16 → BF16 (BF16 acc) 4 All Pass 1,447 Fast, good precision
FP16 → FP16 (FP16 acc) 4 All Pass 15,162 Slower performance
BF16 → BF16 (FP32 acc) 4 All Pass 729 Best BF16 performance
BF16 → FP32 (FP32 acc) 4 All Pass 710 Mixed precision
FP16 → FP16 (FP32 acc) 4 All Pass 885 Good FP16 performance
FP16 → FP32 (FP32 acc) 4 All Pass 703 Fast mixed precision
S8 → S32 4 All Pass 815 Integer ops
TF32 → FP32 1 Pass 2,029 Single test
FP8 → FP32 4 All Pass 2,036 Emerging format
FP16 + S8 → FP32 4 All Pass 864 Mixed input types
Universal GEMM 3 All Pass 728 Advanced features
FP8 Model Tests 10 All Pass 738 ML model scenarios

@rishi-yadav rishi-yadav marked this pull request as draft October 11, 2025 21:12
@rishi-yadav rishi-yadav marked this pull request as ready for review October 11, 2025 21:12
@aschabana aschabana marked this pull request as draft October 13, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant