Afd demo #12

CZRZ · 2025-09-08T12:10:04Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: czrz <[email protected]>

github-actions · 2025-09-08T12:10:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Signed-off-by: czrz <[email protected]>

hsliuustc0106

please check

hsliuustc0106 · 2025-09-09T11:15:26Z

examples/afd/offline_attn.py

@@ -0,0 +1,16 @@
+from vllm import LLM, SamplingParams


import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--temperature", type=float, default=0.8)
parser.add_argument("--top_p", type=float, default=0.95)
args = parser.parse_args()
sampling_params = SamplingParams(temperature=args.temperature, top_p=args.top_p)

hsliuustc0106 · 2025-09-09T11:17:08Z

examples/afd/offline_attn.py

+]
+
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+#llm = LLM(model="/data2/models/Qwen3-0.6B")


hsliuustc0106 · 2025-09-09T11:22:18Z

vllm/distributed/afd/ncclconnector.py

+from vllm.distributed.afd.AFDconnector import AFDConnectorBase
+
+
+class ncclconnector(AFDConnectorBase):


XcclConnector

NcclConnector

change to p2pconnector

hsliuustc0106 · 2025-09-09T11:22:40Z

vllm/model_executor/model_loader/default_loader.py

-            if weights_not_loaded:
-                raise ValueError("Following weights were not initialized from "
-                                 f"checkpoint: {weights_not_loaded}")
+            # if weights_not_loaded:


hsliuustc0106 · 2025-09-09T11:23:08Z

vllm/model_executor/models/deepseek_v2.py

@@ -31,11 +31,12 @@
 from torch import nn
 from transformers import DeepseekV2Config, DeepseekV3Config

+import vllm.distributed.parallel_state as ps


这个估计precommit过不了，import有前后顺序

hsliuustc0106 · 2025-09-09T11:24:48Z

vllm/model_executor/models/deepseek_v2.py

+        intermediate_tensors = ps._AFD_CONNECTOR.recv_attn_output()
+        hidden_states = intermediate_tensors["hidden_states"]
+
+        # ae_group = get_afd_group()


apply to all, del all unnecessary comment

hsliuustc0106 · 2025-09-09T11:28:33Z

vllm/model_executor/models/deepseek_v2.py

        expert_params_mapping = FusedMoE.make_expert_params_mapping(
            ckpt_gate_proj_name="gate_proj",
            ckpt_down_proj_name="down_proj",
            ckpt_up_proj_name="up_proj",
            num_experts=self.config.n_routed_experts,
-            num_redundant_experts=self.num_redundant_experts)
+            num_redundant_experts=vllm_config.parallel_config.num_redundant_experts)


why we need to change this?

self.num_redundant_experts comes from an example_moe, which does not exist in attn part. we have to get num_redundant_experts from vllm config to avoid error

hsliuustc0106 · 2025-09-16T01:11:11Z

vllm/v1/worker/gpu_model_runner.py

        return group_metadata
+
+
+class FFNModelRunner(GPUModelRunner):


class FFNModelRunner(GPUModelRunner): def __init__(self, vllm_config: VllmConfig, device: torch.device): super().__init__(vllm_config=vllm_config, device=device) self.vllm_config = vllm_config self.model_config = vllm_config.model_config self._shutdown_event = threading.Event() # Add shutdown mechanism def execute_model(self): print('ffn forward begin') try: with set_forward_context(None, self.vllm_config): while not self._shutdown_event.is_set(): # ✅ Add exit condition try: layers_num = len(self.model.model.layers) for i in range(layers_num): if self._shutdown_event.is_set(): # Check for shutdown break self.model.model.layers[i].forward_ffn() # Add small delay to prevent busy waiting time.sleep(0.001) except Exception as e: logger.error(f"Error in FFN execution: {e}") break except Exception as e: logger.error(f"FFN Model Runner failed: {e}") raise def shutdown(self): """Gracefully shutdown the FFN model runner.""" self._shutdown_event.set()

try-except doesn't align with vllm's model runner

hsliuustc0106 · 2025-09-16T01:12:34Z

vllm/distributed/afd/ncclconnector.py

+        super().__init__(process_group)
+        self.process_group = process_group
+
+    def send_attn_output(self, intermediate_tensors: IntermediateTensors):


class ncclconnector(AFDConnectorBase): def send_attn_output(self, intermediate_tensors: IntermediateTensors): """Send attention output with proper error handling.""" try: self.process_group.send_tensor_dict( intermediate_tensors.tensors, dst=0, # ✅ Uncomment and fix destination all_gather_group=None, timeout=timedelta(seconds=30) # ✅ Add timeout ) except Exception as e: logger.error(f"Failed to send attention output: {e}") raise RuntimeError(f"Communication error: {e}") def recv_attn_output(self) -> IntermediateTensors: """Receive attention output with proper error handling.""" try: intermediate_tensors = self.process_group.recv_tensor_dict( src=0, # ✅ Uncomment and fix source all_gather_group=None, timeout=timedelta(seconds=30) # ✅ Add timeout ) return IntermediateTensors(intermediate_tensors) except Exception as e: logger.error(f"Failed to receive attention output: {e}") raise RuntimeError(f"Communication error: {e}")

hsliuustc0106 · 2025-09-16T01:13:42Z

vllm/v1/worker/gpu_worker.py

+        role = self.vllm_config.additional_config.get("role", None)
+        logger.info("AFD worker building")
+
+        ffn_size = self.vllm_config.additional_config.get("ffn_size")


def create_worker(vllm_config, rank, distributed_init_method, is_driver_worker: bool = True): # ✅ Add configuration validation additional_config = vllm_config.additional_config ffn_size = additional_config.get("ffn_size") attn_size = additional_config.get("attn_size") if ffn_size is None or attn_size is None: raise ValueError("ffn_size and attn_size must be specified in additional_config") if not isinstance(ffn_size, int) or not isinstance(attn_size, int): raise ValueError("ffn_size and attn_size must be integers") if ffn_size <= 0 or attn_size <= 0: raise ValueError("ffn_size and attn_size must be positive integers")

Signed-off-by: czrz <[email protected]>

github-actions · 2025-12-22T03:14:11Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

afd demo

270627f

Signed-off-by: czrz <[email protected]>

use afdconnector

5430e1f

Signed-off-by: czrz <[email protected]>

CZRZ force-pushed the afd branch from a104fd5 to 5430e1f Compare September 9, 2025 07:02

add demo to examples

b6da1d3

Signed-off-by: czrz <[email protected]>

CZRZ force-pushed the afd branch from 1ea0265 to c5090ce Compare September 9, 2025 11:19

add afd demo readme and precommit

18ee71d

Signed-off-by: czrz <[email protected]>

CZRZ force-pushed the afd branch from c5090ce to 18ee71d Compare September 9, 2025 12:50

add multi rank afd examples

6e00330

Signed-off-by: czrz <[email protected]>

hsliuustc0106 reviewed Sep 16, 2025

View reviewed changes

CZRZ force-pushed the afd branch from 47c272b to ad5583b Compare September 17, 2025 09:22

support multi node afd

ad2991d

Signed-off-by: czrz <[email protected]>

CZRZ force-pushed the afd branch from ad5583b to ad2991d Compare September 23, 2025 02:35

yenuo26 mentioned this pull request Nov 14, 2025

[Bug]: vllm gpu-memory-utilization设置0.98，启动Qwen2.5-VL-7B-Instruct 1E2PD后，使用textvqa_subset数据集，并发32发送请求，一段时间后PD实例OOM退出 #143

Closed

1 task

github-actions bot added the stale label Dec 22, 2025

		from vllm.distributed.afd.AFDconnector import AFDConnectorBase


		class ncclconnector(AFDConnectorBase):

Afd demo #12

Are you sure you want to change the base?

Afd demo #12

Uh oh!

Conversation

CZRZ commented Sep 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CZRZ commented Sep 8, 2025 •

edited by github-actions bot

Loading