JAX FA Benchmarking Script #351

Micky774 · 2025-10-24T18:07:40Z

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

benchmarks/attention/benchmark_attention_jax.py

benchmarks/attention/panel_app.py

transformer_engine/common/ck_fused_attn/src/ck_fused_attn_bwd.cpp

ipanfilo

Why ck_fused_attn_bwd needs modification if the PR is for fwd-pass only?

Micky774 · 2025-10-31T21:03:11Z

Note that I have added the BWD pass implementation as well to this PR.

Micky774 · 2025-11-10T18:46:48Z

Pinging @wangye805 @wenchenvincent in case either of you are interested in reviewing this PR as well, thanks!

wenchenvincent · 2025-11-11T14:34:24Z

benchmarks/attention/README.md

+## JAX Fused-Attention Benchmarking
+The benchmarking process is split into two stages: *generating* the timing data, and *visualizing* the timing data. The following steps assume you are located in `TransformerEngine/benchmarks/attention` (i.e. where this README is located). First, ensure that you install requirements via `pip install -r requirements.txt`.
+
+Note: Only forward timings are supported at this point.


wenchenvincent · 2025-11-11T14:42:01Z

benchmarks/attention/benchmark_attention_jax.py

+from transformer_engine.jax import fp8_autocast
+
+# Needed in order to dump timings properly
+os.environ["XLA_FLAGS"]="--xla_gpu_graph_level=0"


Is this because you used dumping time function in ck fused attention?

transformer_engine/common/ck_fused_attn/src/ck_fused_attn_bwd.cpp

wenchenvincent · 2025-12-02T03:32:27Z

benchmarks/attention/benchmark_attention_jax.py

+        attn_bias_type, bias_shape = bias_config
+        window_size = None
+        if swa:
+            window_size = (s_kv // 10, 0)


Why do this for SWA?

This was taken from our JAX FA testing.

wenchenvincent · 2025-12-02T03:36:32Z

Have you compared the kernel time measured from CK FA API vs from rocprof?

Micky774 added 8 commits October 21, 2025 14:20

Initial plotting utils

3303107

Merge branch 'dev' into zain/jax-bench

c4f2122

Included direct AITER kernel runtime reporting

ce3e271

Improved labeling

8794542

Streamlined interface

3716dfc

Added env var manager context

0d8b552

Added readme and requirements.txt and streamlined app

5e49af9

Trim

84b2f67

Micky774 requested review from ipanfilo, wangye805 and wenchenvincent as code owners October 24, 2025 18:07

ipanfilo reviewed Oct 31, 2025

View reviewed changes

benchmarks/attention/benchmark_attention_jax.py Outdated Show resolved Hide resolved

benchmarks/attention/panel_app.py Show resolved Hide resolved

transformer_engine/common/ck_fused_attn/src/ck_fused_attn_bwd.cpp Outdated Show resolved Hide resolved

ipanfilo reviewed Oct 31, 2025

View reviewed changes

Updated to include bwd pass and improve timing dump behavior

d8bed2b

Micky774 changed the title ~~JAX FA Benchmarking Script (fwd-pass only)~~ JAX FA Benchmarking Script Oct 31, 2025

Micky774 requested a review from ipanfilo November 5, 2025 20:48

wenchenvincent reviewed Nov 11, 2025

View reviewed changes

Updated readme

23548bd

ipanfilo reviewed Nov 14, 2025

View reviewed changes

transformer_engine/common/ck_fused_attn/src/ck_fused_attn_bwd.cpp Show resolved Hide resolved

Micky774 added 2 commits November 25, 2025 15:35

Merge branch 'dev' into zain/jax-bench

8d09b37

Removed unused arg

4118e96

Micky774 self-assigned this Dec 1, 2025

ipanfilo approved these changes Dec 1, 2025

View reviewed changes

wenchenvincent reviewed Dec 2, 2025

View reviewed changes

JAX FA Benchmarking Script #351

Are you sure you want to change the base?

JAX FA Benchmarking Script #351

Uh oh!

Conversation

Micky774 commented Oct 24, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ipanfilo left a comment

Choose a reason for hiding this comment

Uh oh!

Micky774 commented Oct 31, 2025

Uh oh!

Micky774 commented Nov 10, 2025

Uh oh!

wenchenvincent Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Micky774 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

wenchenvincent Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Micky774 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenchenvincent Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Micky774 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

wenchenvincent commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants