Adding Compute-Context-Length (CCL) #612

quic-rishinr · 2025-11-07T11:21:26Z

Updated version of Adding Compute-Context-Length (CCL) #576
Compute-Context-Length (CCL) technique optimizes the throughput of large language models (LLMs) on Qualcomm devices when handling very large context lengths. The current Ahead Of Time (AOT) compilation on Qualcomm devices doesn't predict the number of tokens needed, leading to significant throughput drops during the prefilling and the decoding phases. This happens because the system performs attention calculations based on large context length. To address this issue, we introduce Compute Context Length (CCL), an additional ONNX variable that allows for dynamic context-length specialization. By generating tokens using smaller, more manageable context lengths (CCL), we optimize memory reads and attention calculations, thereby improving throughput.

Signed-off-by: Vahid Janfaza <[email protected]>

Signed-off-by: vjanfaza <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

Signed-off-by: vjanfaza <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

Signed-off-by: Rishin Raj <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

Signed-off-by: Rishin Raj <[email protected]>

Signed-off-by: Vahid Janfaza <[email protected]>

quic-rishinr · 2025-11-10T06:13:30Z

Duplicate of #576 closing this PR

vjanfaza added 30 commits October 16, 2025 23:25

Adding Compute-Context-Length(CCL)

f00737f

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

5410733

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

13271c6

Signed-off-by: Vahid Janfaza <[email protected]>

Delete examples/granite_example/ccl_granitemoe_inference.py

3332962

Signed-off-by: vjanfaza <[email protected]>

Adding Compute-Context-Length(CCL)

b4bf5f9

Signed-off-by: Vahid Janfaza <[email protected]>

Merge remote-tracking branch 'origin/CCL-main' into CCL-main

a4f8b4b

Adding Compute-Context-Length(CCL)

9363689

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

a4fca59

Signed-off-by: Vahid Janfaza <[email protected]>

Merge branch 'quic:main' into CCL-main

acc4e40

improving handeling CCL lists

5f047b4

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

71c5182

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

1d74b42

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

811b1ce

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

7b57d90

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

0b88a32

Signed-off-by: Vahid Janfaza <[email protected]>

fixing lora testing

2ade913

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

acf3544

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

2643e9f

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

495b44f

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

9d1a63a

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

eb3aea5

Signed-off-by: Vahid Janfaza <[email protected]>

Delete examples/granite_example/ccl_granitemoe_inference.py

736c775

Signed-off-by: vjanfaza <[email protected]>

Adding Compute-Context-Length(CCL)

027625c

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

42b4b7f

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

fa3c2f6

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

8fb3265

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

ee2f54e

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

bb2a207

Signed-off-by: Vahid Janfaza <[email protected]>

improving handeling CCL lists

0e9c851

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

6cedad2

Signed-off-by: Vahid Janfaza <[email protected]>

vjanfaza and others added 15 commits October 23, 2025 11:13

fixing lora testing

528ad38

Signed-off-by: Vahid Janfaza <[email protected]>

Adding Compute-Context-Length(CCL)

ba18a3e

Signed-off-by: Vahid Janfaza <[email protected]>

Updated the test

6d056f9

Signed-off-by: Rishin Raj <[email protected]>

Lint fix

c9aaaec

Signed-off-by: Rishin Raj <[email protected]>

Merge remote-tracking branch 'origin/CCL-main' into CCL-main

8f29a42

Adding the support of modeling_gpt_bigcode with CCL

5765779

Signed-off-by: Vahid Janfaza <[email protected]>

Removed redendunt test

0468a90

Signed-off-by: Rishin Raj <[email protected]>

Adding Compute-Context-Length(CCL)

d8f4eab

Signed-off-by: Vahid Janfaza <[email protected]>

Merge remote-tracking branch 'origin/CCL-main' into CCL-main

8b6ab58

Adding Compute-Context-Length(CCL)

7e952ad

Signed-off-by: Vahid Janfaza <[email protected]>

Add CCL support to molmo model

2d137f9

Signed-off-by: Vahid Janfaza <[email protected]>

Adding support of multimodal models in vllm with CCL

65a76bd

Signed-off-by: Vahid Janfaza <[email protected]>

Update the test script

4fac443

Signed-off-by: Vahid Janfaza <[email protected]>

Manual fixes before merge

069f86a

Adding Compute-Context-Length(CCL)

9b8c8f5

Signed-off-by: Vahid Janfaza <[email protected]>

quic-rishinr requested review from ochougul, quic-amitraj and quic-hemagnih as code owners November 7, 2025 11:21

quic-rishinr closed this Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Compute-Context-Length (CCL) #612

Adding Compute-Context-Length (CCL) #612

Uh oh!

quic-rishinr commented Nov 7, 2025

Uh oh!

quic-rishinr commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding Compute-Context-Length (CCL) #612

Adding Compute-Context-Length (CCL) #612

Uh oh!

Conversation

quic-rishinr commented Nov 7, 2025

Uh oh!

quic-rishinr commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants