Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
Binary file modified community/.DS_Store
Binary file not shown.
6 changes: 5 additions & 1 deletion community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,8 @@ Community examples are sample code and deployments for RAG pipelines that are no

* [Chat with LLM Llama 3.1 Nemotron Nano 4B](./chat-llama-nemotron/)

This is a React-based conversational UI designed for interacting with a powerful local LLM. It incorporates RAG to enhance contextual understanding and is backed by an NVIDIA Dynamo inference server running the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model. The setup enables low-latency, cloud-free AI assistant capabilities, with live document search and reasoning, all deployable on local or edge infrastructure.
This is a React-based conversational UI designed for interacting with a powerful local LLM. It incorporates RAG to enhance contextual understanding and is backed by an NVIDIA Dynamo inference server running the NVIDIA Llama-3.1-Nemotron-Nano-4B-v1.1 model. The setup enables low-latency, cloud-free AI assistant capabilities, with live document search and reasoning, all deployable on local or edge infrastructure.

* [Reasoning Coder](./reasoning_coder/)

Reasoning Coder is a demonstration of a coding agent for reasoning-aware code generation powered by the open-source [NVIDIA Nemotron Nano 9B v2 model](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2). The agent combines the strengths of large language model coding capabilities with a reasoning budget control mechanism, enabling more transparent and efficient problem-solving. It is designed to showcase how developers can integrate self-hosted vLLM deployments to run advanced code assistants locally or on their own infrastructure. The demo highlights how NVIDIA Nemotron Nano 9B v2 reasoning features can be applied to software development workflows, making it easier to experiment with streaming, non-streaming, and reasoning-driven code generation in a reproducible environment.
56 changes: 56 additions & 0 deletions community/reasoning_coder/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Virtual environments
venv/
env/
ENV/
env.bak/
venv.bak/

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Streamlit
.streamlit/

# Logs
*.log

# Environment variables
.env
.env.local
.env.*.local
104 changes: 104 additions & 0 deletions community/reasoning_coder/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Reasoning Coder

Reasoning Coder is a demonstration of a coding agen for reasoning-aware code generation powered by the open-source [NVIDIA Nemotron Nano 9B v2 model](https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2). The agent combines the strengths of large language model coding capabilities with a reasoning budget control mechanism, enabling more transparent and efficient problem-solving.

It is designed to showcase how developers can integrate self-hosted vLLM deployments to run advanced code assistants locally or on their own infrastructure. The demo highlights how NVIDIA Nemotron Nano 9B v2 reasoning features can be applied to software development workflows, making it easier to experiment with streaming, non-streaming, and reasoning-driven code generation in a reproducible environment.

## Features

- **Reasoning Budget Control**: Toggle reasoning on/off and control token budget
- **Streaming Support**: Real-time streaming of responses
- **Code Generation**: AI-powered code generation for various programming languages
- **File Upload Context**: Upload files to provide context for better code generation

## Requirements

- **vLLM server** running with Nemotron Nano 9B v2 model
- **Hugging Face token** to download the model. Get one [here](https://huggingface.co/settings/tokens).
- **Python 3.8+** environment
- **Docker** (optional)

## vLLM Server Setup

### Basic vLLM Installation
```bash
pip install -U "vllm>=0.10.1"
```

Alternativly, you can use Docker to launch a vLLM server. See the instructions below.

## Quick Start

### 1. Start your vLLM server

```bash
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
--trust-remote-code \
--mamba_ssm_cache_dtype float32 \
--max-num-seqs 64 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8888
```

Or, if you are using Docker:

```bash
export HF_CACHE_DIR=<your_local_HF_directory>
export HF_TOKEN=<your_HF_token>
export TP_SIZE=1

docker run --runtime nvidia --gpus all --ipc=host \
-v "$HF_CACHE_DIR":/hf_cache \
-e HF_HOME=/hf_cache \
-e "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
-e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
-p 8888:8888 \
vllm/vllm-openai:v0.10.1 \
--model nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
--tensor-parallel-size ${TP_SIZE} \
--trust-remote-code \
--mamba_ssm_cache_dtype float32 \
--max-num-seqs 64 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8888
```

#### Customize Endpoint
If you're running vLLM on a different port or host, update the `DEFAULT_LOCAL_API` constant in `reasoning_coder.py`.


### 2. Setup the Coding Client

Clone this repository

```bash
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/reasoning-coder
```

Activate virtual environment and install dependencies.

```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```

### 3. Run Demo
```bash
streamlit run reasoning_coder.py
```

The UI should open in the browser under http://localhost:8501/.

## Example Prompts

Try these built-in examples:
- "Write a Python function to find the longest palindromic substring in a string"
- "Create a recursive function to solve the Tower of Hanoi puzzle"
- "Implement a binary search tree with insertion and search operations"
- "Write a function to validate email addresses using regex"
- "Create a simple web scraper using Python requests and BeautifulSoup"

Loading