Skip to content

Commit aa65217

Browse files
nv-anantsathreesh
authored andcommitted
docs: move all md files from components to docs (#3440)
Signed-off-by: Anant Sharma <[email protected]> Co-authored-by: Anish <[email protected]>
1 parent 0c4ba0c commit aa65217

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+107
-95
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ High-throughput, low-latency inference framework designed for serving generative
3030

3131
## Latest News
3232

33-
- [08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./components/backends/trtllm/gpt-oss.md)
33+
- [08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./docs/backends/trtllm/gpt-oss.md)
3434

3535
## The Era of Multi-GPU, Multi-Node
3636

@@ -65,9 +65,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
6565

6666
To learn more about each framework and their capabilities, check out each framework's README!
6767

68-
- **[vLLM](components/backends/vllm/README.md)**
69-
- **[SGLang](components/backends/sglang/README.md)**
70-
- **[TensorRT-LLM](components/backends/trtllm/README.md)**
68+
- **[vLLM](docs/backends/vllm/README.md)**
69+
- **[SGLang](docs/backends/sglang/README.md)**
70+
- **[TensorRT-LLM](docs/backends/trtllm/README.md)**
7171

7272
Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.
7373

components/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@ This directory contains the core components that make up the Dynamo inference fr
2323

2424
Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:
2525

26-
- **[vLLM](backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
27-
- **[SGLang](backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
28-
- **[TensorRT-LLM](backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
26+
- **[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
27+
- **[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
28+
- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
2929

3030
Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
3131

components/backends/sglang/slurm_jobs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ For this example, we will make some assumptions about your SLURM cluster:
1717
If your cluster supports similar container based plugins, you may be able to
1818
modify the template to use that instead.
1919
3. We assume you have already built a recent Dynamo+SGLang container image as
20-
described [here](../docs/dsr1-wideep-gb200.md#instructions).
20+
described [here](../../../../docs/backends/sglang/dsr1-wideep-gb200.md#instructions).
2121
This is the image that can be passed to the `--container-image` argument in later steps.
2222

2323
## Scripts Overview

components/backends/trtllm/deploy/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ envs:
232232
233233
## Testing the Deployment
234234
235-
Send a test request to verify your deployment. See the [client section](../../../../components/backends/vllm/README.md#client) for detailed instructions.
235+
Send a test request to verify your deployment. See the [client section](../../../../docs/backends/vllm/README.md#client) for detailed instructions.
236236
237237
**Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
238238

@@ -254,7 +254,7 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
254254
- **UCX** (default): Standard method for KV cache transfer
255255
- **NIXL** (experimental): Alternative transfer method
256256

257-
For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
257+
For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/backends/trtllm/kv-cache-transfer.md).
258258

259259
## Request Migration
260260

@@ -282,8 +282,8 @@ Configure the `model` name and `host` based on your deployment.
282282
- **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
283283
- **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
284284
- **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
285-
- **Multinode Deployment**: [Multinode Examples](../multinode/multinode-examples.md)
286-
- **Speculative Decoding**: [Llama 4 + Eagle Guide](../llama4_plus_eagle.md)
285+
- **Multinode Deployment**: [Multinode Examples](../../../../docs/backends/trtllm/multinode/multinode-examples.md)
286+
- **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/backends/trtllm/llama4_plus_eagle.md)
287287
- **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
288288

289289
## Troubleshooting

components/backends/trtllm/performance_sweeps/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Please note that:
4141
3. `post_process.py` - Scan the genai-perf results to produce a json with entries to each config point.
4242
4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization.
4343

44-
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
44+
For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
4545

4646
## Usage
4747

docs/_includes/dive_in_examples.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,20 @@ The examples below assume you build the latest image yourself from source. If us
1111

1212
Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
1313

14-
.. grid-item-card:: :doc:`vLLM <../components/backends/vllm/README>`
15-
:link: ../components/backends/vllm/README
14+
.. grid-item-card:: :doc:`vLLM <../backends/vllm/README>`
15+
:link: ../backends/vllm/README
1616
:link-type: doc
1717

1818
Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
1919

20-
.. grid-item-card:: :doc:`SGLang <../components/backends/sglang/README>`
21-
:link: ../components/backends/sglang/README
20+
.. grid-item-card:: :doc:`SGLang <../backends/sglang/README>`
21+
:link: ../backends/sglang/README
2222
:link-type: doc
2323

2424
Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with SGLang.
2525

26-
.. grid-item-card:: :doc:`TensorRT-LLM <../components/backends/trtllm/README>`
27-
:link: ../components/backends/trtllm/README
26+
.. grid-item-card:: :doc:`TensorRT-LLM <../backends/trtllm/README>`
27+
:link: ../backends/trtllm/README
2828
:link-type: doc
2929

3030
Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with TensorRT-LLM.

docs/_sections/backends.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,6 @@ Dynamo currently supports the following high-performance inference backends:
3737
.. toctree::
3838
:maxdepth: 1
3939

40-
vLLM <../components/backends/vllm/README>
41-
SGLang <../components/backends/sglang/README>
42-
TensorRT-LLM <../components/backends/trtllm/README>
40+
vLLM <../backends/vllm/README>
41+
SGLang <../backends/sglang/README>
42+
TensorRT-LLM <../backends/trtllm/README>

docs/architecture/kvbm_intro.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,4 +63,4 @@ The Dynamo KV Block Manager serves as a reference implementation that emphasizes
6363
KVBM Architecture <kvbm_architecture.md>
6464
Understanding KVBM components <kvbm_components.md>
6565
KVBM Further Reading <kvbm_reading>
66-
LMCache Integration <../components/backends/vllm/LMCache_Integration.md>
66+
LMCache Integration <../backends/vllm/LMCache_Integration>

components/backends/sglang/README.md renamed to docs/backends/sglang/README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,13 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
3535

3636
| Feature | SGLang | Notes |
3737
|---------|--------|-------|
38-
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) || |
39-
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
40-
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) || |
41-
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) || |
42-
| [**Multimodal EPD Disaggregation**](docs/multimodal_epd.md) || |
43-
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) || Planned |
44-
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) || Planned |
38+
| [**Disaggregated Serving**](../../architecture/disagg_serving.md) || |
39+
| [**Conditional Disaggregation**](../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
40+
| [**KV-Aware Routing**](../../architecture/kv_cache_routing.md) || |
41+
| [**SLA-Based Planner**](../../architecture/sla_planner.md) || |
42+
| [**Multimodal EPD Disaggregation**](multimodal_epd.md) || |
43+
| [**Load Based Planner**](../../architecture/load_planner.md) || Planned |
44+
| [**KVBM**](../../architecture/kvbm_architecture.md) || Planned |
4545

4646
### Large Scale P/D and WideEP Features
4747

@@ -229,7 +229,7 @@ cd $DYNAMO_HOME/components/backends/sglang
229229
./launch/disagg_dp_attn.sh
230230
```
231231

232-
When using MoE models, you can also use the our implementation of the native SGLang endpoints to record expert distribution data. The `disagg_dp_attn.sh` script automatically sets up the SGLang HTTP server, the environment variable that controls the expert distribution recording directory, and sets up the expert distribution recording mode to `stat`. You can learn more about expert parallelism load balancing [here](docs/expert-distribution-eplb.md).
232+
When using MoE models, you can also use the our implementation of the native SGLang endpoints to record expert distribution data. The `disagg_dp_attn.sh` script automatically sets up the SGLang HTTP server, the environment variable that controls the expert distribution recording directory, and sets up the expert distribution recording mode to `stat`. You can learn more about expert parallelism load balancing [here](expert-distribution-eplb.md).
233233

234234
### Testing the Deployment
235235

@@ -266,24 +266,24 @@ This allows a request to be migrated up to 3 times before failing. See the [Requ
266266
Below we provide a selected list of advanced examples. Please open up an issue if you'd like to see a specific example!
267267

268268
### Run a multi-node sized model
269-
- **[Run a multi-node model](docs/multinode-examples.md)**
269+
- **[Run a multi-node model](multinode-examples.md)**
270270

271271
### Large scale P/D disaggregation with WideEP
272-
- **[Run DeepSeek-R1 on 104+ H100s](docs/dsr1-wideep-h100.md)**
273-
- **[Run DeepSeek-R1-FP8 on GB200s](docs/dsr1-wideep-gb200.md)**
272+
- **[Run DeepSeek-R1 on 104+ H100s](dsr1-wideep-h100.md)**
273+
- **[Run DeepSeek-R1-FP8 on GB200s](dsr1-wideep-gb200.md)**
274274

275275
### Hierarchical Cache (HiCache)
276-
- **[Enable SGLang Hierarchical Cache (HiCache)](docs/sgl-hicache-example.md)**
276+
- **[Enable SGLang Hierarchical Cache (HiCache)](sgl-hicache-example.md)**
277277

278278
### Multimodal Encode-Prefill-Decode (EPD) Disaggregation with NIXL
279-
- **[Run a multimodal model with EPD Disaggregation](docs/multimodal_epd.md)**
279+
- **[Run a multimodal model with EPD Disaggregation](multimodal_epd.md)**
280280

281281
## Deployment
282282

283283
We currently provide deployment examples for Kubernetes and SLURM.
284284

285285
## Kubernetes
286-
- **[Deploying Dynamo with SGLang on Kubernetes](deploy/README.md)**
286+
- **[Deploying Dynamo with SGLang on Kubernetes](../../../components/backends/sglang/deploy/README.md)**
287287

288288
## SLURM
289-
- **[Deploying Dynamo with SGLang on SLURM](slurm_jobs/README.md)**
289+
- **[Deploying Dynamo with SGLang on SLURM](../../../components/backends/sglang/slurm_jobs/README.md)**
File renamed without changes.

0 commit comments

Comments
 (0)