Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 25 additions & 9 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,23 @@ This repository contains AI agent skills for building with [Agora](https://www.a
## Repository Structure

```
scripts/
└── validate-skills.sh # Static validation
skills/
├── scripts/validate-skills.sh
└── skills/
└── agora/ # Skill root
├── SKILL.md # Entry point, product index
├── intake/SKILL.md # Intake router for vague requests
└── references/
├── mcp-tools.md # MCP tool reference + freeze-forever table
├── rtc/ rtm/ conversational-ai/ server/ cloud-recording/
└── agora/ # Skill root
├── SKILL.md # Entry point, product index
├── intake/SKILL.md # Intake router for vague requests
└── references/
├── doc-fetching.md # Two-tier lookup procedure
├── mcp-tools.md # MCP tool reference + graceful degradation
├── integration-patterns.md # RTC + RTM + ConvoAI coordination
├── rtc/ # RTC: Web, React, Next.js, iOS, Android, RN, Flutter
├── rtm/ # RTM v2: Web, iOS, Android
├── conversational-ai/ # ConvoAI: REST, SDKs, toolkits, auth flow
├── server/ # Token generation
├── cloud-recording/ # Cloud Recording REST API
├── server-gateway/ # Linux Server Gateway
└── testing-guidance/ # Mocking patterns and test guidance
```

## 4-Layer Progressive Disclosure
Expand All @@ -27,7 +35,7 @@ skills/

## Freeze-Forever Rule

Ask: **will this still be correct in 6 months without any updates?** If yes, put it inline. If no, route to MCP or an external link. See [`skills/agora/references/mcp-tools.md`](skills/agora/references/mcp-tools.md) for the full decision table.
Ask: **will this still be correct in 6 months without any updates?** If yes, put it inline. If no, route to Level 2 docs lookup or an external link. MCP is preferred only when installed and supported in the current tool/runtime.

## Naming Conventions

Expand All @@ -48,3 +56,11 @@ Ask: **will this still be correct in 6 months without any updates?** If yes, put
```bash
bash scripts/validate-skills.sh
```

Validation covers:

- frontmatter checks for all frontmatter-bearing markdown files under `skills/agora/`
- duplicate skill names
- broken relative links
- absolute local path leakage (`/Users/...`)
- blocklisted internal terms
27 changes: 24 additions & 3 deletions scripts/validate-skills.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,19 @@ if not skills_root.exists():

skill_files = sorted(skills_root.rglob("SKILL.md"))
md_files = sorted(skills_root.rglob("*.md"))
frontmatter_files = []

errors = []
skill_names = []

# ── Hugo's original checks (ported verbatim) ───────────────────────────────

for path in skill_files:
for path in md_files:
text = path.read_text(encoding="utf-8")
lines = text.splitlines()
if len(lines) < 3 or lines[0].strip() != "---":
errors.append(f"{path}: missing YAML frontmatter")
continue
frontmatter_files.append(path)
end = None
for i in range(1, len(lines)):
if lines[i].strip() == "---":
Expand All @@ -55,6 +56,23 @@ for path in skill_files:
if not version_match:
errors.append(f"{path}: frontmatter missing 'metadata.version'")

# Enforce frontmatter description length for any file that declares frontmatter.
if desc_match:
desc_start = desc_match.end()
first_line = desc_match.group(0).split(":", 1)[1].strip()
if first_line in {"|", ">-", ">"}:
desc_lines = []
for line in fm[desc_start:].splitlines():
if re.match(r"^[ \t]+", line):
desc_lines.append(line.lstrip(" \t"))
else:
break
description = "\n".join(desc_lines)
else:
description = first_line.strip().strip('"').strip("'")
if len(description) > 1024:
errors.append(f"{path}: frontmatter 'description' exceeds 1024 characters ({len(description)})")

name_to_paths = {}
for name, path in skill_names:
name_to_paths.setdefault(name, []).append(path)
Expand Down Expand Up @@ -139,5 +157,8 @@ if errors:
print(f"- {err}")
sys.exit(1)

print(f"Validation passed: {len(skill_files)} skills, {len(md_files)} markdown files checked.")
print(
f"Validation passed: {len(skill_files)} skills, "
f"{len(frontmatter_files)} frontmatter files, {len(md_files)} markdown files checked."
)
PY
4 changes: 2 additions & 2 deletions skills/agora/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: agora
description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice calling, live streaming, screen sharing), RTM (signaling, messaging, presence), Conversational AI (voice AI agents), Cloud Recording, Server Gateway, and server-side token generation. Use when the user wants to build real-time audio/video applications, integrate Agora SDKs (Web JS/TS, React, iOS Swift, Android Kotlin/Java, Go, Python), manage channels, tracks, tokens, use RTM for messaging/signaling, record RTC sessions, or build Conversational AI with the agent-toolkit. Triggers on mentions of Agora, agora.io, RTC, RTM, video calling, voice calling, real-time communication, screen share, screen sharing, record session, record calls, Cloud Recording, Server Gateway, Linux media SDK, agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, conversational AI with Agora, Agora token generation, Agora authentication, agora-agent-client-toolkit, agora-agent-client-toolkit-react, agora-agent-server-sdk, AgoraVoiceAI, AgoraClient, useConversationalAI, useTranscript, useAgentState, agent transcript, agent state hook.
description: Write code using Agora SDKs (agora.io) for real-time communication. Covers RTC (video/voice, live streaming, screen sharing), RTM/signaling, Conversational AI voice agents, Cloud Recording, Server Gateway, and token generation. Use for Agora, RTC, RTM, video calling, voice calling, screen sharing, recording, tokens, signaling, or ConvoAI requests across Web, React, Next.js, iOS, Android, Go, and Python. Triggers include agora-rtc-sdk-ng, agora-rtc-react, agora-rtm, agora-agent-server-sdk, AgoraVoiceAI, AgoraClient, useConversationalAI, useTranscript, useAgentState, Cloud Recording, Server Gateway, and Agora authentication.
metadata:
author: agora
version: '1.2.0'
Expand Down Expand Up @@ -40,7 +40,7 @@ Text messaging, signaling, presence, and metadata. Independent from RTC — chan

REST API-driven voice AI agents. Create agents that join RTC channels and converse with users via speech. Front-end clients connect via RTC+RTM.

**[references/conversational-ai/README.md](references/conversational-ai/README.md)** — REST API, agent config, 6 recipe repos (agent-samples, agent-toolkit, agent-client-toolkit-react, agent-ui-kit, server-custom-llm, server-mcp)
**[references/conversational-ai/README.md](references/conversational-ai/README.md)** — Start here for new projects (quickstart repos to clone), REST API, agent config, recipe repos (agent-samples, agent-toolkit, agent-client-toolkit-react, agent-ui-kit, server-custom-llm, server-mcp)

### Cloud Recording

Expand Down
18 changes: 15 additions & 3 deletions skills/agora/references/conversational-ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

REST API-driven voice AI agents. Create agents that join RTC channels and converse with users via speech. Front-end clients connect via RTC+RTM.

## Start Here: New Projects

**Building a new Conversational AI agent? Clone a quickstart repo — do not build from scratch.**

| Path | Repo | Use when |
|---|---|---|
| **Full-stack Next.js** (default) | [agent-quickstart-nextjs](https://github.com/AgoraIO-Conversational-AI/agent-quickstart-nextjs) | Single repo: Next.js API routes + React UI |
| **Python backend + React frontend** | [conversational-ai-quickstart](https://github.com/AgoraIO-Community/conversational-ai-quickstart) *(private)* | Separate Python server + standalone React client |

See **[quickstarts.md](quickstarts.md)** for clone steps, env vars, and setup instructions.

## SDK vs. Direct REST API

**Default to the SDK for the user's backend language.** The TypeScript, Go, and Python SDKs wrap the REST API and handle auth, token generation, and session lifecycle automatically.
Expand All @@ -14,7 +25,6 @@ REST API-driven voice AI agents. Create agents that join RTC channels and conver
| Java, Ruby, PHP, C#, other | Call the REST API directly — see [auth-flow.md](auth-flow.md) |

Direct REST API use is fully supported for languages without an SDK. The [auth-flow.md](auth-flow.md) file covers the end-to-end auth and token flow for REST API implementors. If the user has an SDK available, start with that instead — the SDK eliminates the need to manually build tokens for the ConvoAI server.

The live OpenAPI spec is the authoritative source for request/response schemas:

```
Expand Down Expand Up @@ -87,7 +97,7 @@ const response = await fetch(
);
```

> **Note:** Token-based auth for ConvoAI REST API calls is not yet in official docs (pending release). The behavior is stable — `Authorization: agora token=<RTC+RTM token>` is accepted by the ConvoAI endpoint. Verify against official docs once published.
> **Rule:** Use token auth (`Authorization: agora token=<RTC+RTM token>`) as the default for new direct REST implementations. Basic Auth remains available, but grants broader account-level access.

See [../server/tokens.md](../server/tokens.md) for `buildTokenWithRtm` parameter reference.

Expand Down Expand Up @@ -122,7 +132,8 @@ Things the official docs don't emphasize that cause frequent mistakes:
- **`agent_rtc_uid` is a string, not an int** — pass `"0"` (string) for auto-assignment, not `0`. Passing an integer will cause a type error at the API boundary.
- **`remote_rtc_uids` is an array of strings** — use `["*"]` to subscribe to all users, not `"*"` or `["0"]`. The wildcard must be in array form.
- **Agent name must be unique per project** — collisions return HTTP 409. Use a short UUID suffix: `agent_{uuid[:8]}`. On 409, generate a new name and retry; do not retry with the same name.
- **Token auth is not in the official docs — use it for production.** The ConvoAI REST API accepts `Authorization: agora token=<token>` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Default to token auth unless the user explicitly requires officially documented auth methods, in which case use Basic Auth. See [Authentication → Option A](#authentication) for the implementation.
- **Use token auth as the default for new direct REST integrations.** The ConvoAI REST API accepts `Authorization: agora token=<token>` using a combined RTC + RTM token from `RtcTokenBuilder.buildTokenWithRtm`. This is **safer than Basic Auth**: tokens are scoped to a single App ID + channel, while Customer ID/Secret grants access to every project on the account. Use Basic Auth only when a user explicitly needs that mode.
- **POST `/join` success does not mean the agent is already in the RTC channel** — the request was accepted and the agent is starting. The client should wait for the RTC `user-joined` event before expecting agent audio or querying media state.
- **`/update` overwrites `params` entirely** — sending `{ "llm": { "params": { "max_tokens": 2048 } } }` erases `model` and everything else in `params`. Always send the full object.
- **`/speak` priority enum** — `"INTERRUPT"` (immediate, default), `"APPEND"` (queued after current speech), `"IGNORE"` (skip if agent is busy). `interruptable: false` prevents users from cutting in.
- **20 PCU default limit** — max 20 concurrent agents per App ID. Exceeding returns error on `/join`. Contact Agora support to increase.
Expand Down Expand Up @@ -153,6 +164,7 @@ Use the file that matches what the user is building:

| User's question / task | Read this file |
|---|---|
| Starting a new project — which repo to clone, setup, env vars | [quickstarts.md](quickstarts.md) |
| Node.js/Python/Go backend — starting agent, auth, session lifecycle | [server-sdks.md](server-sdks.md) |
| Python SDK specifics (async, deprecations, debug) | [python-sdk.md](python-sdk.md) |
| Go SDK specifics (context, builder, status constants) | [go-sdk.md](go-sdk.md) |
Expand Down
117 changes: 117 additions & 0 deletions skills/agora/references/conversational-ai/quickstarts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
name: conversational-ai-quickstarts
description: |
Quickstart repos for building Agora Conversational AI agents. Use when the user is starting
a new ConvoAI project and needs a working baseline to clone. Covers two paths: full-stack
Next.js (agent-quickstart-nextjs) and separate Python backend + React frontend
(conversational-ai-quickstart, private repo). Always direct users to clone one of these
before building from scratch.
license: MIT
metadata:
author: agora
version: '1.0.0'
---

# Conversational AI Quickstarts

**Always start here when building a new Conversational AI agent.** Clone one of the repos below — do not build from scratch.

## Choose Your Path

| I want to... | Use |
|---|---|
| Build a full-stack app in a single repo (Next.js API routes + React UI) | **Path A — agent-quickstart-nextjs** |
| Build a separate Python backend with a standalone React frontend | **Path B — conversational-ai-quickstart** (private) |

---

## Path A — Full-Stack Next.js (Default)

**Repo:** <https://github.com/AgoraIO-Conversational-AI/agent-quickstart-nextjs>

Single Next.js application covering everything: token generation, agent lifecycle API routes, and the React UI. Best starting point for most projects.

> **Note:** Agora SDKs are browser-only. Because this is a Next.js app, follow the SSR patterns in **[references/rtc/nextjs.md](../rtc/nextjs.md)** if you add custom RTC components — `next/dynamic` with `ssr: false` requires extra steps in Next.js 14+ Server Components.

> **agent-quickstart-nextjs vs. agent-samples**: `agent-quickstart-nextjs` is a single self-contained Next.js app with API routes (no separate server process). `agent-samples` is a multi-repo monorepo with a separate Python Flask backend and Next.js React clients — use it if you need a Python server or want to study a more decomposed architecture. See [agent-samples.md](agent-samples.md).

### What's Included

- Next.js API routes for token generation (`/api/generate-agora-token`), starting (`/api/invite-agent`), and stopping (`/api/stop-conversation`) agents
- React UI with live transcription, audio visualization, device selection, and mobile-responsive chat
- `agora-agent-uikit`, `agora-agent-client-toolkit`, and `agora-agent-server-sdk` pre-wired
- Dual RTC + RTM token auth
- One-click Vercel deployment

### Stack

- **Framework:** Next.js (TypeScript)
- **UI:** Tailwind CSS + shadcn/ui
- **Real-time:** Agora RTC + RTM
- **ASR:** Deepgram
- **TTS:** ElevenLabs
- **LLM:** OpenAI-compatible endpoint (OpenAI, Anthropic, etc.)

### Setup

```bash
git clone https://github.com/AgoraIO-Conversational-AI/agent-quickstart-nextjs.git
cd agent-quickstart-nextjs
pnpm install
# Copy the env template — check the repo for the exact filename (.env.local.example or .env.example)
cp .env.local.example .env.local
pnpm dev
```

Open `http://localhost:3000`.

**Requirements:** Node.js 22.x+, pnpm 8.x+

### Environment Variables

```bash
# Agora
AGORA_APP_ID=
AGORA_APP_CERTIFICATE=

# LLM (OpenAI-compatible)
LLM_URL=https://api.openai.com/v1/chat/completions
LLM_API_KEY=

# ASR
DEEPGRAM_API_KEY=

# TTS
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
```

> The App Certificate is required for token generation. Get both from [Agora Console](https://console.agora.io).

---

## Path B — Python Backend + React Frontend (Private Repo)

**Repo:** <https://github.com/AgoraIO-Community/conversational-ai-quickstart> *(private — contact your Agora developer relations or solutions engineer contact to request access)*

Use this when you need a separate Python backend and a standalone React frontend deployed independently.

- Python backend handles token generation and agent lifecycle via the ConvoAI REST API
- React frontend connects via RTC + RTM
- Refer to the repo README for setup once you have access

---

## After Cloning

Once the baseline is running (applies to both paths — Path B users should substitute their Python backend's equivalent for any server-side steps):

| Next step | Reference |
|---|---|
| Customize LLM, TTS, ASR vendor/model | Fetch `https://docs-md.agora.io/en/conversational-ai/develop/custom-llm.md` |
| Add transcript rendering / agent state to a custom UI | [agent-toolkit.md](agent-toolkit.md) |
| Use React hooks (useTranscript, useAgentState) | [agent-client-toolkit-react.md](agent-client-toolkit-react.md) |
| Swap in pre-built React UI components | [agent-ui-kit.md](agent-ui-kit.md) |
| Add a custom LLM backend (RAG, tool calling) | [server-custom-llm.md](server-custom-llm.md) |
| Production token generation | [../server/tokens.md](../server/tokens.md) |
| Full REST API reference | [README.md](README.md#rest-api-endpoints) |
2 changes: 2 additions & 0 deletions skills/agora/references/doc-fetching.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ rules. If the answer is here, stop — no fetch needed.
When bundled references don't cover the detail needed (full request/response schemas,
vendor-specific configs, language-specific quick-start code):

If the Agora Docs MCP tool is available in the current tool/runtime, prefer it for Level 2 lookup. Otherwise use the HTTP fetch flow below. If MCP returns no useful result, fall back to HTTP fetch.

1. Fetch the Agora docs sitemap:
```
GET https://docs.agora.io/en/llms.txt
Expand Down
Loading
Loading