Skip to content

Commit 7d7ace9

Browse files
Aydin-abFuture-Outlier
authored andcommitted
[docs] Add gpt oss deployment example (ray-project#56400)
Signed-off-by: Future-Outlier <eric901201@gmail.com>
1 parent 713f85f commit 7d7ace9

24 files changed

Lines changed: 1063 additions & 79 deletions

File tree

doc/source/serve/examples.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,14 @@ examples:
122122
- natural language processing
123123
link: tutorials/deployment-serve-llm/hybrid-reasoning-llm/README
124124
related_technology: llm applications
125+
- title: Deploy gpt-oss
126+
skill_level: beginner
127+
use_cases:
128+
- generative ai
129+
- large language models
130+
- natural language processing
131+
link: tutorials/deployment-serve-llm/gpt-oss/README
132+
related_technology: llm applications
125133
- title: Serve a Chatbot with Request and Response Streaming
126134
skill_level: intermediate
127135
use_cases:

doc/source/serve/llm/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,5 @@ Cache-aware request routing <prefix-aware-request-router>
6767
- {doc}`Deploy a large-sized LLM <../tutorials/deployment-serve-llm/large-size-llm/README>`
6868
- {doc}`Deploy a vision LLM <../tutorials/deployment-serve-llm/vision-llm/README>`
6969
- {doc}`Deploy a reasoning LLM <../tutorials/deployment-serve-llm/reasoning-llm/README>`
70-
- {doc}`Deploy a hybrid reasoning LLM <../tutorials/deployment-serve-llm/hybrid-reasoning-llm/README>`
70+
- {doc}`Deploy a hybrid reasoning LLM <../tutorials/deployment-serve-llm/hybrid-reasoning-llm/README>`
71+
- {doc}`Deploy gpt-oss <../tutorials/deployment-serve-llm/gpt-oss/README>`

doc/source/serve/tutorials/deployment-serve-llm/README.ipynb

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,12 @@
3939
"---\n",
4040
"\n",
4141
"**[Deploy a hybrid reasoning LLM](https://docs.ray.io/en/latest/serve/tutorials/deployment-serve-llm/hybrid-reasoning-llm/README.html)** \n",
42-
"Deploy models that can switch between reasoning and non-reasoning modes for flexible usage, such as Qwen-3."
42+
"Deploy models that can switch between reasoning and non-reasoning modes for flexible usage, such as Qwen-3.\n",
43+
"\n",
44+
"---\n",
45+
"\n",
46+
"**[Deploy gpt-oss](https://docs.ray.io/en/latest/ray-overview/examples/deployment-serve-llm/gpt-oss/README.html)** \n",
47+
"Deploy gpt-oss reasoning models for high-reasoning, production-scale workloads, for lower latency (`gpt-oss-20b`) and high-reasoning (`gpt-oss-120b`) use cases."
4348
]
4449
}
4550
],

doc/source/serve/tutorials/deployment-serve-llm/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,8 @@ Deploy models with reasoning capabilities designed for long-context tasks, codin
3939

4040
**[Deploy a hybrid reasoning LLM](https://docs.ray.io/en/latest/serve/tutorials/deployment-serve-llm/hybrid-reasoning-llm/README.html)**
4141
Deploy models that can switch between reasoning and non-reasoning modes for flexible usage, such as Qwen-3.
42+
43+
---
44+
45+
**[Deploy gpt-oss](https://docs.ray.io/en/latest/ray-overview/examples/deployment-serve-llm/gpt-oss/README.html)**
46+
Deploy gpt-oss reasoning models for high-reasoning, production-scale workloads, for lower latency (`gpt-oss-20b`) and high-reasoning (`gpt-oss-120b`) use cases.

doc/source/serve/tutorials/deployment-serve-llm/ci/nb2py.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,17 @@ def convert_notebook(
4242
else:
4343
# Detect any IPython '!' shell commands in code lines
4444
has_bang = any(line.lstrip().startswith("!") for line in lines)
45-
if has_bang:
45+
# Start with "serve run" "serve shutdown" "curl" or "anyscale service" commands
46+
to_ignore_cmd = (
47+
"serve run",
48+
"serve shutdown",
49+
"curl",
50+
"anyscale service",
51+
)
52+
has_ignored_start = any(
53+
line.lstrip().startswith(to_ignore_cmd) for line in lines
54+
)
55+
if has_bang or has_ignored_start:
4656
if ignore_cmds:
4757
continue
4858
out.write("import subprocess\n")

doc/source/serve/tutorials/deployment-serve-llm/ci/tests.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@ for nb in \
1212
"large-size-llm/notebook" \
1313
"vision-llm/notebook" \
1414
"reasoning-llm/notebook" \
15-
"hybrid-reasoning-llm/notebook"
15+
"hybrid-reasoning-llm/notebook" \
16+
"gpt-oss/notebook"
1617
do
1718
python ci/nb2py.py "${nb}.ipynb" "${nb}.py" --ignore-cmds
1819
python "${nb}.py"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
FROM anyscale/ray:2.49.0-slim-py312-cu128
2+
3+
# C compiler for Triton’s runtime build step (vLLM V1 engine)
4+
# https://github.com/vllm-project/vllm/issues/2997
5+
RUN sudo apt-get update && \
6+
sudo apt-get install -y --no-install-recommends build-essential
7+
8+
RUN pip install vllm==0.10.1

0 commit comments

Comments
 (0)