Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,5 @@ server/data/kanban/audit.log
server-dist/data/kanban/tasks.json
server-dist/data/kanban/audit.log
node_modules

.worktrees/
14 changes: 11 additions & 3 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ Synthesizes speech from text. Returns raw audio binary.
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `text` | `string` | Yes | Text to synthesize (1–5000 chars, non-empty after trim) |
| `provider` | `"openai" \| "replicate" \| "edge" \| "qwen" \| "xiaomi"` | No | TTS provider. `"qwen"` is a legacy alias for `"replicate"` + model `"qwen-tts"` |
| `provider` | `"openai" \| "mistral" \| "replicate" \| "edge" \| "qwen" \| "xiaomi"` | No | TTS provider. `"qwen"` is a legacy alias for `"replicate"` + model `"qwen-tts"` |
| `voice` | `string` | No | Provider-specific voice name |
| `model` | `string` | No | Provider-specific model ID |

Expand All @@ -299,9 +299,11 @@ Synthesizes speech from text. Returns raw audio binary.
2. Replicate, if `REPLICATE_API_TOKEN` is set
3. Edge TTS, always available (free, no API key)

Mistral Voxtral is available when `provider: "mistral"` is requested and `MISTRAL_API_KEY` is configured. It is not part of the automatic fallback chain.

Xiaomi MiMo is available when `provider: "xiaomi"` is requested and `MIMO_API_KEY` is configured. It is not part of the automatic fallback chain.

**Response:** `audio/mpeg` binary for OpenAI, Replicate, and Edge, or `audio/wav` for Xiaomi MiMo (200)
**Response:** `audio/mpeg` binary for OpenAI, Mistral, Replicate, and Edge, or `audio/wav` for Xiaomi MiMo (200)

**Errors:**

Expand Down Expand Up @@ -334,6 +336,7 @@ Returns the current TTS voice configuration.
"instructions": "Speak naturally and conversationally, like a real person. Warm, friendly tone with a slight British accent. Keep it casual and relaxed, not robotic or overly formal."
},
"edge": { "voice": "en-US-AriaNeural" },
"mistral": { "model": "voxtral-mini-tts-2603", "voice": "" },
"xiaomi": { "model": "mimo-v2-tts", "voice": "mimo_default", "style": "" }
}
```
Expand All @@ -348,6 +351,7 @@ Partially updates the TTS voice configuration. Only known keys are accepted.
{
"openai": { "voice": "nova", "instructions": "Speak cheerfully" },
"edge": { "voice": "en-GB-SoniaNeural" },
"mistral": { "model": "voxtral-tts-26-03", "voice": "alloy_voice" },
"xiaomi": { "voice": "default_en", "style": "Happy" }
}
```
Expand All @@ -359,6 +363,7 @@ Partially updates the TTS voice configuration. Only known keys are accepted.
| `qwen` | `mode`, `language`, `speaker`, `voiceDescription`, `styleInstruction` |
| `openai` | `model`, `voice`, `instructions` |
| `edge` | `voice` |
| `mistral` | `model`, `voice` |
| `xiaomi` | `model`, `voice`, `style` |

All values must be strings, max 2000 characters each.
Expand All @@ -373,6 +378,7 @@ Returns whether optional provider keys are configured. Key values are never retu
{
"openaiKeySet": true,
"replicateKeySet": false,
"mistralKeySet": true,
"xiaomiKeySet": true
}
```
Expand All @@ -387,6 +393,7 @@ Writes optional provider keys to `.env` and hot-reloads the in-memory config.
{
"openaiKey": "sk-...",
"replicateToken": "r8_...",
"mistralApiKey": "sk-mistral-...",
"mimoApiKey": "sk-mimo-..."
}
```
Expand All @@ -398,9 +405,10 @@ Any subset of fields may be provided. Sending an empty string clears that key.
```json
{
"ok": true,
"message": "OPENAI_API_KEY saved, MIMO_API_KEY saved",
"message": "OPENAI_API_KEY saved, MISTRAL_API_KEY saved, MIMO_API_KEY saved",
"openaiKeySet": true,
"replicateKeySet": false,
"mistralKeySet": true,
"xiaomiKeySet": true
}
```
Expand Down
1 change: 1 addition & 0 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,7 @@ Applied in order in `app.ts`:
| File | Purpose |
|------|---------|
| `services/openai-tts.ts` | OpenAI TTS API client (gpt-4o-mini-tts, tts-1, tts-1-hd) |
| `services/mistral-tts.ts` | Mistral Voxtral TTS API client (`/v1/audio/speech`, base64 audio decode) |
| `services/replicate-tts.ts` | Replicate API client for hosted TTS models (Qwen3-TTS). WAV→MP3 via ffmpeg |
| `services/edge-tts.ts` | Microsoft Edge Read-Aloud TTS via WebSocket protocol. Free, zero-config. Includes Sec-MS-GEC token generation |
| `services/tts-cache.ts` | LRU in-memory TTS cache with TTL expiry (100 MB budget) |
Expand Down
7 changes: 6 additions & 1 deletion docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Prompts for optional API keys:
- `OPENAI_API_KEY`, enables OpenAI TTS + Whisper transcription
- `REPLICATE_API_TOKEN`, enables Qwen TTS via Replicate (warns if `ffmpeg` is missing)

Edge TTS always works without any keys. Xiaomi MiMo can be enabled later by setting `MIMO_API_KEY` manually or saving it from Settings, Audio.
Edge TTS always works without any keys. Mistral Voxtral and Xiaomi MiMo can be enabled later by setting `MISTRAL_API_KEY` / `MIMO_API_KEY` manually or saving them from Settings, Audio.

#### 6. Advanced Settings (Optional)

Expand Down Expand Up @@ -151,11 +151,13 @@ AGENT_NAME=Friday
|----------|-------------|
| `OPENAI_API_KEY` | Enables OpenAI TTS (multiple voices) and Whisper audio transcription |
| `REPLICATE_API_TOKEN` | Enables Replicate-hosted TTS models (e.g. Qwen TTS). Requires `ffmpeg` for WAV→MP3 |
| `MISTRAL_API_KEY` | Enables Mistral Voxtral TTS when the Mistral provider is selected in Settings, Audio |
| `MIMO_API_KEY` | Enables Xiaomi MiMo TTS when the Xiaomi provider is selected in Settings, Audio |

```bash
OPENAI_API_KEY=sk-...
REPLICATE_API_TOKEN=r8_...
MISTRAL_API_KEY=sk-mistral-...
MIMO_API_KEY=sk-mimo-...
```

Expand All @@ -164,6 +166,8 @@ TTS provider fallback chain (when no explicit provider is requested):
2. **Replicate** — if `REPLICATE_API_TOKEN` is set
3. **Edge TTS** — always available, no API key needed (default for new installs)

Mistral Voxtral is available as an explicit provider option when `MISTRAL_API_KEY` is set. It is not part of the automatic fallback chain.

Xiaomi MiMo is available as an explicit provider option when `MIMO_API_KEY` is set. It is not part of the automatic fallback chain.

### Speech-to-Text (STT)
Expand Down Expand Up @@ -434,6 +438,7 @@ NERVE_SESSION_TTL=2592000000
# API Keys
OPENAI_API_KEY=sk-...
REPLICATE_API_TOKEN=r8_...
MISTRAL_API_KEY=sk-mistral-...
MIMO_API_KEY=sk-mimo-...

# Speech / Language
Expand Down
1 change: 1 addition & 0 deletions docs/SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ HSTS is always sent (`max-age=31536000; includeSubDomains`), even over HTTP. Bro
| `GATEWAY_TOKEN` | `.env` file (chmod 600) | Used server-side for trusted official-gateway connections. `/api/connect-defaults` returns `token: null`. Never logged. |
| `OPENAI_API_KEY` | `.env` file | Used server-side only. Never sent to clients. |
| `REPLICATE_API_TOKEN` | `.env` file | Used server-side only. Never sent to clients. |
| `MISTRAL_API_KEY` | `.env` file | Used server-side only. Never sent to clients. |
| Gateway URL + optional manual token | `localStorage` (`oc-config`) | Used for reconnects. Trusted official-gateway flows usually keep the token empty; manually entered custom-gateway tokens persist until cleared. |

The setup wizard applies `chmod 600` to `.env` and backup files, restricting read access to the file owner.
Expand Down
1 change: 1 addition & 0 deletions docs/TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ After approval, reconnect from the browser (refresh the page or click reconnect)
2. **TTS provider configured?** Check Settings → Audio → TTS Provider
3. **API key present?**
- OpenAI: requires `OPENAI_API_KEY`
- Mistral Voxtral: requires `MISTRAL_API_KEY`
- Replicate: requires `REPLICATE_API_TOKEN`
- Xiaomi MiMo: requires `MIMO_API_KEY`
- Edge: no key needed (free)
Expand Down
16 changes: 12 additions & 4 deletions server/lib/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ export const config = {

openaiApiKey: process.env.OPENAI_API_KEY || '',
replicateApiToken: process.env.REPLICATE_API_TOKEN || '',
mistralApiKey: process.env.MISTRAL_API_KEY || '',
mimoApiKey: process.env.MIMO_API_KEY || '',

// Speech-to-text
Expand Down Expand Up @@ -174,10 +175,14 @@ export const WS_ALLOWED_HOSTS = new Set([

/** Resolve the TTS provider label for the startup banner. */
function ttsProviderLabel(): string {
if (config.openaiApiKey && config.replicateApiToken) return 'OpenAI + Replicate + Edge';
if (config.openaiApiKey) return 'OpenAI + Edge';
if (config.replicateApiToken) return 'Replicate + Edge';
return 'Edge (free)';
const providers = [
config.openaiApiKey ? 'OpenAI' : null,
config.mistralApiKey ? 'Mistral' : null,
config.replicateApiToken ? 'Replicate' : null,
'Edge',
].filter(Boolean);

return providers.join(' + ') + (providers.length === 1 ? ' (free)' : '');
}

/** Resolve the STT provider label for the startup banner. */
Expand Down Expand Up @@ -265,6 +270,9 @@ export function validateConfig(): void {
if (!config.replicateApiToken) {
console.warn('[config] ⚠ REPLICATE_API_TOKEN not set — Qwen TTS unavailable');
}
if (!config.mistralApiKey) {
console.warn('[config] ⚠ MISTRAL_API_KEY not set — Mistral Voxtral TTS unavailable');
}
if (!process.env.NERVE_LANGUAGE && process.env.LANGUAGE) {
console.warn('[config] ⚠ LANGUAGE is deprecated — use NERVE_LANGUAGE instead');
}
Expand Down
2 changes: 2 additions & 0 deletions server/lib/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@

export const OPENAI_BASE_URL = process.env.OPENAI_BASE_URL || 'https://api.openai.com/v1';
export const REPLICATE_BASE_URL = process.env.REPLICATE_BASE_URL || 'https://api.replicate.com/v1';
export const MISTRAL_BASE_URL = process.env.MISTRAL_BASE_URL || 'https://api.mistral.ai/v1';

// ─── API endpoints (derived from base URLs) ──────────────────────────────────

export const OPENAI_TTS_URL = `${OPENAI_BASE_URL}/audio/speech`;
export const OPENAI_WHISPER_URL = `${OPENAI_BASE_URL}/audio/transcriptions`;
export const REPLICATE_QWEN_TTS_URL = `${REPLICATE_BASE_URL}/models/qwen/qwen3-tts/predictions`;
export const MISTRAL_TTS_URL = `${MISTRAL_BASE_URL}/audio/speech`;

// ─── Default connection ──────────────────────────────────────────────────────

Expand Down
10 changes: 7 additions & 3 deletions server/lib/tts-config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,27 +57,31 @@ async function loadTtsModule(opts: {
}

describe('getTTSConfig', () => {
it('returns Xiaomi defaults when config file is missing', async () => {
it('returns Mistral and Xiaomi defaults when config file is missing', async () => {
const mod = await loadTtsModule({
language: 'en',
edgeVoiceGender: 'female',
storedVoice: 'en-US-JennyNeural',
});

const cfg = mod.getTTSConfig();
expect(cfg.mistral.model).toBe('voxtral-mini-tts-2603');
expect(cfg.mistral.voice).toBe('82c99ee6-f932-423f-a4a3-d403c8914b8d');
expect(cfg.xiaomi.model).toBe('mimo-v2-tts');
expect(cfg.xiaomi.voice).toBe('mimo_default');
expect(cfg.xiaomi.style).toBe('');
});

it('deep-merges Xiaomi patches without dropping defaults', async () => {
it('deep-merges Mistral and Xiaomi patches without dropping defaults', async () => {
const mod = await loadTtsModule({
language: 'en',
edgeVoiceGender: 'female',
storedVoice: 'en-US-JennyNeural',
});

const cfg = mod.updateTTSConfig({ xiaomi: { style: 'Happy' } });
const cfg = mod.updateTTSConfig({ mistral: { voice: 'alloy_voice' }, xiaomi: { style: 'Happy' } });
expect(cfg.mistral.model).toBe('voxtral-mini-tts-2603');
expect(cfg.mistral.voice).toBe('alloy_voice');
expect(cfg.xiaomi.style).toBe('Happy');
expect(cfg.xiaomi.model).toBe('mimo-v2-tts');
expect(cfg.xiaomi.voice).toBe('mimo_default');
Expand Down
13 changes: 12 additions & 1 deletion server/lib/tts-config.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* TTS voice configuration — reads/writes a JSON config file.
*
* All voice-related settings (OpenAI, Qwen/Replicate, Edge) live here
* All voice-related settings (OpenAI, Mistral, Qwen/Replicate, Edge) live here
* instead of env vars or hardcoded values. On first run, default settings
* are written to `<PROJECT_ROOT>/tts-config.json`. Subsequent reads merge
* the on-disk config with defaults so new fields are always present.
Expand Down Expand Up @@ -46,6 +46,13 @@ export interface TTSVoiceConfig {
/** Voice name (e.g. en-US-AriaNeural, en-GB-SoniaNeural) */
voice: string;
};
/** Mistral Voxtral TTS settings */
mistral: {
/** Mistral model name */
model: string;
/** Preset or custom Mistral voice ID */
voice: string;
};
/** Xiaomi MiMo TTS settings */
xiaomi: {
/** Xiaomi model name */
Expand Down Expand Up @@ -74,6 +81,10 @@ const DEFAULTS: TTSVoiceConfig = {
edge: {
voice: 'en-US-AriaNeural',
},
mistral: {
model: 'voxtral-mini-tts-2603',
voice: '82c99ee6-f932-423f-a4a3-d403c8914b8d',
},
xiaomi: {
model: 'mimo-v2-tts',
voice: 'mimo_default',
Expand Down
31 changes: 30 additions & 1 deletion server/routes/api-keys.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@ describe('api-keys routes', () => {
vi.restoreAllMocks();
});

function mockDeps(overrides: { mimoKey?: string } = {}) {
function mockDeps(overrides: { mistralKey?: string; mimoKey?: string } = {}) {
const mockConfig: Record<string, unknown> = {
openaiApiKey: '',
replicateApiToken: '',
mistralApiKey: overrides.mistralKey || '',
mimoApiKey: overrides.mimoKey || '',
};

Expand All @@ -38,6 +39,17 @@ describe('api-keys routes', () => {
return app;
}

it('reports mistralKeySet from config', async () => {
mockDeps({ mistralKey: 'sk-mistral' });
const app = await buildApp();

const res = await app.request('/api/keys');
expect(res.status).toBe(200);

const json = await res.json() as Record<string, unknown>;
expect(json.mistralKeySet).toBe(true);
});

it('reports xiaomiKeySet from config', async () => {
mockDeps({ mimoKey: 'sk-mimo' });
const app = await buildApp();
Expand All @@ -49,6 +61,23 @@ describe('api-keys routes', () => {
expect(json.xiaomiKeySet).toBe(true);
});

it('writes MISTRAL_API_KEY from mistralApiKey input', async () => {
mockDeps();
const app = await buildApp();

const res = await app.request('/api/keys', {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ mistralApiKey: 'sk-mistral' }),
});

expect(res.status).toBe(200);

const json = await res.json() as Record<string, unknown>;
expect(json.ok).toBe(true);
expect(json.mistralKeySet).toBe(true);
});

it('writes MIMO_API_KEY from mimoApiKey input', async () => {
mockDeps();
const app = await buildApp();
Expand Down
9 changes: 9 additions & 0 deletions server/routes/api-keys.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ app.get('/api/keys', rateLimitGeneral, (c) => {
return c.json({
openaiKeySet: !!config.openaiApiKey,
replicateKeySet: !!config.replicateApiToken,
mistralKeySet: !!config.mistralApiKey,
xiaomiKeySet: !!config.mimoApiKey,
});
});
Expand All @@ -43,6 +44,13 @@ app.put('/api/keys', rateLimitGeneral, async (c) => {
results.push(val ? 'REPLICATE_API_TOKEN saved' : 'REPLICATE_API_TOKEN cleared');
}

if (body.mistralApiKey !== undefined) {
const val = body.mistralApiKey.trim();
await writeEnvKey('MISTRAL_API_KEY', val);
(config as Record<string, unknown>).mistralApiKey = val;
results.push(val ? 'MISTRAL_API_KEY saved' : 'MISTRAL_API_KEY cleared');
}

if (body.mimoApiKey !== undefined) {
const val = body.mimoApiKey.trim();
await writeEnvKey('MIMO_API_KEY', val);
Expand All @@ -55,6 +63,7 @@ app.put('/api/keys', rateLimitGeneral, async (c) => {
message: results.join(', ') || 'No changes',
openaiKeySet: !!config.openaiApiKey,
replicateKeySet: !!config.replicateApiToken,
mistralKeySet: !!config.mistralApiKey,
xiaomiKeySet: !!config.mimoApiKey,
});
} catch {
Expand Down
Loading
Loading