ai-dynamo
diff --git a/‎README.md‎
Lines changed: 4 additions & 4 deletions b/‎README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎components/README.md‎
Lines changed: 3 additions & 3 deletions b/‎components/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎components/backends/sglang/slurm_jobs/README.md‎
Lines changed: 1 addition & 1 deletion b/‎components/backends/sglang/slurm_jobs/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/backends/trtllm/deploy/README.md‎
Lines changed: 4 additions & 4 deletions b/‎components/backends/trtllm/deploy/README.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎components/backends/trtllm/performance_sweeps/README.md‎
Lines changed: 1 addition & 1 deletion b/‎components/backends/trtllm/performance_sweeps/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/_includes/dive_in_examples.rst‎
Lines changed: 6 additions & 6 deletions b/‎docs/_includes/dive_in_examples.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/_sections/backends.rst‎
Lines changed: 3 additions & 3 deletions b/‎docs/_sections/backends.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/architecture/kvbm_intro.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/architecture/kvbm_intro.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎components/backends/sglang/README.md‎ renamed to ‎docs/backends/sglang/README.md‎
Lines changed: 15 additions & 15 deletions b/‎components/backends/sglang/README.md‎ renamed to ‎docs/backends/sglang/README.md‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎components/backends/sglang/docs/dsr1-wideep-gb200.md‎ renamed to ‎docs/backends/sglang/dsr1-wideep-gb200.md‎ b/‎components/backends/sglang/docs/dsr1-wideep-gb200.md‎ renamed to ‎docs/backends/sglang/dsr1-wideep-gb200.md‎
@@ -30,7 +30,7 @@ High-throughput, low-latency inference framework designed for serving generative
 
 ## Latest News
 
-- [08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./components/backends/trtllm/gpt-oss.md)
+- [08/05] Deploy `openai/gpt-oss-120b` with disaggregated serving on NVIDIA Blackwell GPUs using Dynamo [➡️ link](./docs/backends/trtllm/gpt-oss.md)
 
 ## The Era of Multi-GPU, Multi-Node
 
@@ -65,9 +65,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
 
 To learn more about each framework and their capabilities, check out each framework's README!
 
-- **[vLLM](components/backends/vllm/README.md)**
-- **[SGLang](components/backends/sglang/README.md)**
-- **[TensorRT-LLM](components/backends/trtllm/README.md)**
+- **[vLLM](docs/backends/vllm/README.md)**
+- **[SGLang](docs/backends/sglang/README.md)**
+- **[TensorRT-LLM](docs/backends/trtllm/README.md)**
 
 Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.
 
 
@@ -23,9 +23,9 @@ This directory contains the core components that make up the Dynamo inference fr
 
 Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:
 
-- **[vLLM](backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
-- **[SGLang](backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
-- **[TensorRT-LLM](backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
+- **[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
+- **[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
+- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
 
 Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
 
 
@@ -17,7 +17,7 @@ For this example, we will make some assumptions about your SLURM cluster:
    If your cluster supports similar container based plugins, you may be able to
    modify the template to use that instead.
 3. We assume you have already built a recent Dynamo+SGLang container image as
-   described [here](../docs/dsr1-wideep-gb200.md#instructions).
+   described [here](../../../../docs/backends/sglang/dsr1-wideep-gb200.md#instructions).
    This is the image that can be passed to the `--container-image` argument in later steps.
 
 ## Scripts Overview
 
@@ -232,7 +232,7 @@ envs:
 
 ## Testing the Deployment
 
-Send a test request to verify your deployment. See the [client section](../../../../components/backends/vllm/README.md#client) for detailed instructions.
+Send a test request to verify your deployment. See the [client section](../../../../docs/backends/vllm/README.md#client) for detailed instructions.
 
 **Note:** For multi-node deployments, target the node running `python3 -m dynamo.frontend <args>`.
 
@@ -254,7 +254,7 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
 - **UCX** (default): Standard method for KV cache transfer
 - **NIXL** (experimental): Alternative transfer method
 
-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
+For detailed configuration instructions, see the [KV cache transfer guide](../../../../docs/backends/trtllm/kv-cache-transfer.md).
 
 ## Request Migration
 
@@ -282,8 +282,8 @@ Configure the `model` name and `host` based on your deployment.
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
 - **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
-- **Multinode Deployment**: [Multinode Examples](../multinode/multinode-examples.md)
-- **Speculative Decoding**: [Llama 4 + Eagle Guide](../llama4_plus_eagle.md)
+- **Multinode Deployment**: [Multinode Examples](../../../../docs/backends/trtllm/multinode/multinode-examples.md)
+- **Speculative Decoding**: [Llama 4 + Eagle Guide](../../../../docs/backends/trtllm/llama4_plus_eagle.md)
 - **Kubernetes CRDs**: [Custom Resources Documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
 
 ## Troubleshooting
 
@@ -41,7 +41,7 @@ Please note that:
 3. `post_process.py` - Scan the genai-perf results to produce a json with entries to each config point.
 4. `plot_performance_comparison.py` - Takes the json result file for disaggregated and/or aggregated configuration sweeps and plots a pareto line for better visualization.
 
-For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
+For more finer grained details on how to launch TRTLLM backend workers with DeepSeek R1 on GB200 slurm, please refer [multinode-examples.md](../../../../docs/backends/trtllm/multinode/multinode-examples.md). This guide shares similar assumption to the multinode examples guide.
 
 ## Usage
 
 
@@ -11,20 +11,20 @@ The examples below assume you build the latest image yourself from source. If us
 
         Demonstrates the basic concepts of Dynamo by creating a simple GPU-unaware graph
 
-    .. grid-item-card:: :doc:`vLLM <../components/backends/vllm/README>`
-        :link: ../components/backends/vllm/README
+    .. grid-item-card:: :doc:`vLLM <../backends/vllm/README>`
+        :link: ../backends/vllm/README
         :link-type: doc
 
         Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with VLLM.
 
-    .. grid-item-card:: :doc:`SGLang <../components/backends/sglang/README>`
-        :link: ../components/backends/sglang/README
+    .. grid-item-card:: :doc:`SGLang <../backends/sglang/README>`
+        :link: ../backends/sglang/README
         :link-type: doc
 
         Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with SGLang.
 
-    .. grid-item-card:: :doc:`TensorRT-LLM <../components/backends/trtllm/README>`
-        :link: ../components/backends/trtllm/README
+    .. grid-item-card:: :doc:`TensorRT-LLM <../backends/trtllm/README>`
+        :link: ../backends/trtllm/README
         :link-type: doc
 
         Presents examples and reference implementations for deploying Large Language Models (LLMs) in various configurations with TensorRT-LLM.
 
@@ -37,6 +37,6 @@ Dynamo currently supports the following high-performance inference backends:
 .. toctree::
    :maxdepth: 1
 
-   vLLM <../components/backends/vllm/README>
-   SGLang <../components/backends/sglang/README>
-   TensorRT-LLM <../components/backends/trtllm/README>
+   vLLM <../backends/vllm/README>
+   SGLang <../backends/sglang/README>
+   TensorRT-LLM <../backends/trtllm/README>
@@ -63,4 +63,4 @@ The Dynamo KV Block Manager serves as a reference implementation that emphasizes
    KVBM Architecture <kvbm_architecture.md>
    Understanding KVBM components <kvbm_components.md>
    KVBM Further Reading <kvbm_reading>
-   LMCache Integration <../components/backends/vllm/LMCache_Integration.md>
+   LMCache Integration <../backends/vllm/LMCache_Integration>
@@ -35,13 +35,13 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | SGLang | Notes |
 |---------|--------|-------|
-| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
-| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
-| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
-| [**Multimodal EPD Disaggregation**](docs/multimodal_epd.md) | ✅ |  |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | ❌ | Planned |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | ❌ | Planned |
+| [**Disaggregated Serving**](../../architecture/disagg_serving.md) | ✅ |  |
+| [**Conditional Disaggregation**](../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
+| [**KV-Aware Routing**](../../architecture/kv_cache_routing.md) | ✅ |  |
+| [**SLA-Based Planner**](../../architecture/sla_planner.md) | ✅ |  |
+| [**Multimodal EPD Disaggregation**](multimodal_epd.md) | ✅ |  |
+| [**Load Based Planner**](../../architecture/load_planner.md) | ❌ | Planned |
+| [**KVBM**](../../architecture/kvbm_architecture.md) | ❌ | Planned |
 
 ### Large Scale P/D and WideEP Features
 
@@ -229,7 +229,7 @@ cd $DYNAMO_HOME/components/backends/sglang
 ./launch/disagg_dp_attn.sh
 ```
 
-When using MoE models, you can also use the our implementation of the native SGLang endpoints to record expert distribution data. The `disagg_dp_attn.sh` script automatically sets up the SGLang HTTP server, the environment variable that controls the expert distribution recording directory, and sets up the expert distribution recording mode to `stat`. You can learn more about expert parallelism load balancing [here](docs/expert-distribution-eplb.md).
+When using MoE models, you can also use the our implementation of the native SGLang endpoints to record expert distribution data. The `disagg_dp_attn.sh` script automatically sets up the SGLang HTTP server, the environment variable that controls the expert distribution recording directory, and sets up the expert distribution recording mode to `stat`. You can learn more about expert parallelism load balancing [here](expert-distribution-eplb.md).
 
 ### Testing the Deployment
 
@@ -266,24 +266,24 @@ This allows a request to be migrated up to 3 times before failing. See the [Requ
 Below we provide a selected list of advanced examples. Please open up an issue if you'd like to see a specific example!
 
 ### Run a multi-node sized model
-- **[Run a multi-node model](docs/multinode-examples.md)**
+- **[Run a multi-node model](multinode-examples.md)**
 
 ### Large scale P/D disaggregation with WideEP
-- **[Run DeepSeek-R1 on 104+ H100s](docs/dsr1-wideep-h100.md)**
-- **[Run DeepSeek-R1-FP8 on GB200s](docs/dsr1-wideep-gb200.md)**
+- **[Run DeepSeek-R1 on 104+ H100s](dsr1-wideep-h100.md)**
+- **[Run DeepSeek-R1-FP8 on GB200s](dsr1-wideep-gb200.md)**
 
 ### Hierarchical Cache (HiCache)
-- **[Enable SGLang Hierarchical Cache (HiCache)](docs/sgl-hicache-example.md)**
+- **[Enable SGLang Hierarchical Cache (HiCache)](sgl-hicache-example.md)**
 
 ### Multimodal Encode-Prefill-Decode (EPD) Disaggregation with NIXL
-- **[Run a multimodal model with EPD Disaggregation](docs/multimodal_epd.md)**
+- **[Run a multimodal model with EPD Disaggregation](multimodal_epd.md)**
 
 ## Deployment
 
 We currently provide deployment examples for Kubernetes and SLURM.
 
 ## Kubernetes
-- **[Deploying Dynamo with SGLang on Kubernetes](deploy/README.md)**
+- **[Deploying Dynamo with SGLang on Kubernetes](../../../components/backends/sglang/deploy/README.md)**
 
 ## SLURM
-- **[Deploying Dynamo with SGLang on SLURM](slurm_jobs/README.md)**
+- **[Deploying Dynamo with SGLang on SLURM](../../../components/backends/sglang/slurm_jobs/README.md)**