Plexus stores all configuration in the database and manages it via the Admin UI (recommended) or Management API.
Environment variables control server-level settings. Everything else (providers, models, keys, quotas) is stored in the database.
| Variable | Description | Required |
|---|---|---|
ADMIN_KEY |
Password for admin dashboard and management API. Server refuses to start if unset. | Yes |
DATABASE_URL |
Connection string. Supports sqlite:// and postgres:// URIs. |
No |
ENCRYPTION_KEY |
32-byte key for encrypting sensitive data at rest. Generated via: openssl rand -hex 32 |
No |
DATA_DIR |
Directory for SQLite database. | No |
LOG_LEVEL |
Verbosity: error, warn, info, debug, silly |
No |
PORT |
HTTP server port. | No |
HOST |
Address to bind to. | No |
# SQLite (database auto-created in ./data/)
ADMIN_KEY="my-secret" bun run dev
# PostgreSQL
ADMIN_KEY="my-secret" DATABASE_URL="postgres://user:pass@localhost:5432/plexus" bun run dev
# Docker
docker run -e ADMIN_KEY="my-secret" -v ./data:/app/data -p 4000:4000 plexus:latestThe Admin UI (accessible at http://localhost:4000 after starting) is the easiest way to configure Plexus. It provides forms for all configuration options with real-time validation.
- Providers: Add/edit upstream AI providers (API keys, base URLs, model lists)
- Models: Create model aliases with routing logic and pricing
- Keys: Manage client API keys with optional quota assignment
- Quotas: Define usage limits (tokens, requests, or spending) per time window
- MCP Servers: Configure MCP proxy endpoints
- OAuth: Login to OAuth-backed providers (Anthropic, GitHub Copilot, Codex, etc.)
- Settings: Vision fallthrough, global defaults, cooldown configuration
For programmatic configuration, use the Management API (/v0/management/*). All endpoints require the x-admin-key header.
| Endpoint | Description |
|---|---|
GET /v0/management/providers |
List all providers |
PUT /v0/management/providers/{slug} |
Create/update provider |
DELETE /v0/management/providers/{slug} |
Remove provider |
GET /v0/management/aliases |
List all model aliases |
PUT /v0/management/aliases/{slug} |
Create/update alias |
DELETE /v0/management/aliases/{slug} |
Remove alias |
GET /v0/management/keys |
List all API keys |
PUT /v0/management/keys/{name} |
Create/update key |
DELETE /v0/management/keys/{name} |
Remove key |
GET /v0/management/user-quotas |
List quota definitions |
PUT /v0/management/user-quotas/{name} |
Create/update quota |
DELETE /v0/management/user-quotas/{name} |
Remove quota |
GET /v0/management/config/export |
Export full config as JSON |
PUT /v0/management/config |
Import config (replace all) |
See the API Reference for complete endpoint documentation.
A provider represents an upstream AI service that Plexus routes requests to. Each provider has authentication credentials, a base URL, and a list of available models.
| Setting | Description | Required |
|---|---|---|
| Slug | Unique identifier (e.g., openai_direct, anthropic-prod) |
Yes |
| Display Name | Friendly name for logs and UI | No |
| API Base URL | Provider's endpoint. Common values: | Yes |
https://api.openai.com/v1 |
||
https://api.anthropic.com/v1 |
||
https://generativelanguage.googleapis.com/v1beta |
||
https://openrouter.ai/api/v1 |
||
oauth:// (for OAuth-backed providers) |
||
| API Key | Authentication token | Yes |
| Enabled | Whether this provider is active for routing | No (default: true) |
| Headers | Custom HTTP headers sent with every request | No |
| Extra Body | Additional fields merged into every request | No |
| Upstream Timeout | Per-provider request timeout override in milliseconds. If unset, the global timeout is used. | No |
| Disable Cooldown | Exclude from automatic cooldown on errors | No |
| Stall Detection Overrides | Optional per-provider overrides for TTFB/throughput stall detection. Empty = inherit global setting for that field. | No |
| Adapters | Request/response rewrite hooks applied to every model under this provider (see Provider Adapters) | No |
Some providers support multiple API formats (OpenAI chat, Anthropic messages, embeddings). Configure them with a map of protocol → URL:
| Protocol | Use Case |
|---|---|
chat |
OpenAI-compatible chat completions |
messages |
Anthropic Claude Messages API |
embeddings |
OpenAI-compatible embeddings |
image |
Image generation (DALL-E, etc.) |
transcriptions |
Speech-to-text (Whisper) |
speech |
Text-to-speech |
When combined with priority: api_match on a model alias, Plexus prefers providers that natively support the incoming API format.
Plexus supports OAuth-backed providers via the pi-ai library. These require authentication through the Admin UI.
Supported OAuth providers:
- Anthropic Claude
- GitHub Copilot
- OpenAI Codex
- Gemini CLI
- Antigravity
- OpenAI o1-pro
Configuration:
- Set API Base URL to
oauth:// - Set API Key to
oauth - Set OAuth Account (e.g.,
work,personal) - Set OAuth Provider if the provider key differs from pi-ai's expected ID
Once configured, log in via the Admin UI to authorize Plexus. Tokens are stored encrypted (when ENCRYPTION_KEY is set) and auto-refreshed.
Adapters rewrite request payloads outbound to a provider and raw response payloads inbound, fixing provider-specific field-name incompatibilities without modifying the core transformer pipeline.
Adapters can be set at provider level (applied to every model under the provider) or at model level (appended after provider-level adapters for a specific model). Both accept a single name or a list.
| Adapter | Description |
|---|---|
reasoning_content |
Renames reasoning / thinking.content → reasoning_content on outbound assistant messages for providers that use Fireworks/DeepSeek field naming (e.g. Fireworks DeepSeek-R1). Fixes "Extra inputs are not permitted, field: messages[N].reasoning" errors. |
suppress_developer_role |
Rewrites the developer role to system on outbound messages for providers that do not support the newer OpenAI developer role. |
model_override |
Conditionally rewrites the provider model name based on request payload fields. Used for providers that expose reasoning variants as separate model names rather than respecting reasoning-related fields in the request body. See Model Override Adapter below. |
Example — provider-level:
PUT /v0/management/providers/fireworks
{
"api_base_url": "https://api.fireworks.ai/inference/v1",
"api_key": "fw_...",
"adapter": "reasoning_content"
}Example — model-level override:
{
"models": {
"accounts/fireworks/models/deepseek-r1": {
"adapter": ["reasoning_content", "suppress_developer_role"]
}
}
}Adapters are applied in order on outbound (preDispatch) and in reverse on inbound (postDispatch). Pass-through optimisation is automatically disabled when any adapter is active.
The model_override adapter conditionally rewrites the provider model name based on the values or presence of fields in the request payload. This is useful for providers that expose reasoning variants as separate model names (e.g. model-name with reasoning, model-name-fast without) rather than respecting reasoning-related fields in the request body.
How it works:
When the resolved provider model matches a rule's model field AND any of the rule's conditions are satisfied (OR semantics), the model name is rewritten to rewriteTo. Conditions use dotted paths into the request payload.
Configuration:
The model_override adapter is configured at model level only (not provider level). It accepts a rules array in its options:
{
"models": {
"zai-org/GLM-5.1-FP8": {
"adapter": [
{
"name": "model_override",
"options": {
"rules": [
{
"model": "zai-org/GLM-5.1-FP8",
"rewriteTo": "glm-5.1-fast",
"conditions": [
{ "field": "enable_thinking", "value": false },
{ "field": "reasoning.enabled", "value": false },
{ "field": "reasoning.effort", "value": "none" },
{ "field": "budget_tokens", "value": 0 }
]
}
]
}
}
]
}
}
}Rule fields:
| Field | Description |
|---|---|
model |
The provider model name to match against (must match the resolved target model) |
rewriteTo |
The model name to send to the provider instead |
conditions |
Array of conditions; any match triggers the rewrite |
Condition fields:
| Field | Description |
|---|---|
field |
Dotted path into the request payload (e.g. reasoning.enabled, chat_template_kwargs.enable_thinking) |
value |
Value to match (strict equality). If omitted, the condition matches when the field is present (any value) |
Notes:
- The adapter operates on the transformed provider payload, so fields must survive API transformation to be matchable. For chat-to-chat requests, all fields from the original request body are preserved (including non-standard fields like
enable_thinking,thinking_budget, etc.). - Multiple rules are evaluated in order; only the first matching rule applies.
- The rewrite is transparent to the client — billing and usage tracking still reference the original canonical model name.
Quota checkers monitor upstream provider rate limits and prevent routing to exhausted providers.
| Checker Type | Description | Options |
|---|---|---|
synthetic |
Usage from Synthetic API | apiKey (defaults to provider's key) |
naga |
Naga AI balance | |
nanogpt |
NanoGPT usage | |
openai-codex |
Codex quota (OAuth) | Reads token from database |
claude-code |
Claude Code quota (OAuth) | Reads token from database |
zai |
ZAI balance | |
moonshot |
Moonshot balance | |
novita |
Novita balance | |
minimax |
Minimax balance | Requires groupid, hertzSession |
Settings:
enabled: Enable/disable pollingintervalMinutes: Polling frequency (minimum 1)maxUtilizationPercent: Treat provider as exhausted when any window reaches this % (default 99)
Quota data is available via the Management API — see API Reference: Quota Management.
Each provider can optionally set timeoutMs to override the global upstream timeout for requests routed to that provider.
- Unset / omitted: inherit the global timeout
- Set to a number: use that provider-specific timeout instead
- Valid range: any positive integer millisecond value in provider config; the Admin UI exposes this as 1–3600 seconds
Use provider overrides when one backend is predictably slower or faster than the rest. For example, you might keep a 300s global timeout but set a 30s timeout on a fast inference endpoint so stuck requests fail over much sooner.
Each provider can optionally override any of the global stall detection settings with these fields:
stallTtfbMs— per-provider TTFB timeout overridestallTtfbBytes— per-provider TTFB byte threshold overridestallMinBps— per-provider minimum throughput overridestallWindowMs— per-provider sliding-window width overridestallGracePeriodMs— per-provider grace period override
Important inheritance rules:
- If a field is omitted, the provider inherits the global setting for that field.
- For the nullable threshold fields (
stallTtfbMs,stallMinBps), setting the value tonulldisables that stall dimension for the provider. - For the non-null tuning fields (
stallTtfbBytes,stallWindowMs,stallGracePeriodMs),null/empty in practice means “use the inherited global value”.
This lets you keep one global policy while tightening or relaxing stall protection for known outliers.
A model alias is a virtual model name that clients use in requests. Each alias maps to one or more provider targets with routing logic.
| Setting | Description | Required |
|---|---|---|
| Slug | Name clients send (e.g., fast-model) |
Yes |
| Type | chat (default), embeddings, transcriptions, speech, image |
No |
| Additional Aliases | Alternative names that also route here | No |
| Priority | Routing order: selector (default) or api_match |
No |
| Target Groups | Ordered groups of targets, each with its own selector | Yes |
| Metadata | External catalog for model info | No |
Aliases contain one or more target groups. The dispatcher exhausts all healthy targets in group 1 before trying group 2, and so on. Each group has:
- Name — Label for the group (e.g.
"subscription","payg","backup") - Selector — Strategy for ordering targets within this group
- Targets — List of provider/model pairs in this group
This allows you to organise targets by preference. For example:
| Group | Name | Selector | Purpose |
|---|---|---|---|
| 1 | subscription |
e2e_performance |
Use paid subscriptions first (sunk cost) |
| 2 | payg |
cost |
Fall back to cheapest pay-as-you-go option |
| 3 | backup |
in_order |
Last-resort backup target |
If no groups are explicitly configured, all targets live in a single default group.
| Strategy | Behavior |
|---|---|
random (default) |
Distributes requests randomly across healthy targets in the group |
in_order |
Tries targets in order, skips unhealthy ones |
cost |
Routes to cheapest provider (requires pricing) |
performance |
Routes to highest post-TTFT throughput (output tokens / streaming time) |
latency |
Routes to lowest time-to-first-token |
usage |
Routes to provider with least recent usage (last 24 hours) |
e2e_performance |
Routes to highest end-to-end throughput (output tokens / total request time) |
Use performanceExplorationRate (default 0.05) to occasionally explore other targets and prevent locking onto one provider. Applies to performance, latency, and e2e_performance selectors. latencyExplorationRate and e2ePerformanceExplorationRate can be set separately for their respective selectors (each defaults to performanceExplorationRate if not specified). Unlike performance, the e2e_performance selector explores across all candidates including the current best, ensuring end-to-end metrics stay fresh for every provider.
Inline exploration occurs on the live request path: a small fraction of incoming requests are intentionally routed to non-optimal targets so their performance data stays fresh. The trade-off is that those specific requests may be slower than necessary.
To keep performance data fresh without ever diverting live requests, enable backgroundExploration instead. When enabled, the inline exploration above is suppressed for the three perf-based selectors, and Plexus instead fires small representative probe requests in the background against stale targets.
backgroundExploration:
enabled: true # default: false
stalenessThresholdSeconds: 600 # default: 600 (10 minutes), minimum: 1
workerConcurrency: 2 # default: 2, range: 1–16How it works:
- Each live request to an alias whose selector is
latency,performance, ore2e_performancechecks the targets in its active group. Any target whose last probe is older thanstalenessThresholdSecondsis enqueued for a background probe. - The live request itself is never redirected or delayed — it is served by the selector's best choice.
- A worker pool (
workerConcurrency) drains the queue. Per-target in-flight guards and the global cooldown manager prevent duplicate or unhealthy probes. - Each probe is a single canonical chat request (moderate input, two tool definitions,
max_tokens: 1000, streaming). It exercises the same transformer + provider path as live traffic, so TTFT / TPS / E2E TPS are measured identically. - Probes route via
direct/<provider>/<target_model>so they bypass alias resolution and failover, hitting exactly one target. - Probes appear in usage records with
apiKey = "probe"andattribution = "background"(or"manual"for probes triggered from the management test endpoint). They are real requests and do consume provider quota / cost; budgets and rate caps are out of scope in v1. - On cold start, each target's
lastProbedAtis initialised to the process start time — the first probe for a target fires at normal cadence oncestalenessThresholdSecondselapses, not eagerly on the first request.
The inline performanceExplorationRate / latencyExplorationRate / e2ePerformanceExplorationRate settings are preserved and continue to take effect when backgroundExploration.enabled is false (the default). All four settings can be edited from the Config page in the admin UI.
selector(default): Selector picks a provider within each group first, then matches API format.api_match: Filter targets to those whose provider supports the incoming API format first, then apply the group's selector. Best for tools requiring specific API features (e.g., Claude Code with Anthropic messages).
Each target specifies:
- Provider: Must match an existing provider slug
- Model: Upstream model name
- Enabled: Whether this target is active
Link an alias to an external model catalog to return enriched metadata in GET /v1/models:
| Source | URL | Format |
|---|---|---|
openrouter |
openrouter.ai | provider/model |
models.dev |
models.dev | providerid.modelid |
catwalk |
catwalk.charm.sh | providerid.modelid |
Metadata loads at startup. Failures are non-fatal — Plexus operates without enriched data if a source is unavailable.
Bypass aliases entirely using the format direct/<provider>/<model>:
curl ... -d '{"model": "direct/openai_direct/gpt-4o-mini", ...}'- Provider and model must exist in configuration
- Bypasses selector logic and alias settings
API keys authenticate clients to inference endpoints (/v1/*).
| Setting | Description | Required |
|---|---|---|
| Name | Unique identifier | Yes |
| Secret | Bearer token (clients send in Authorization header) |
Yes |
| Comment | Description or owner | No |
| Quota | Name of a quota definition to enforce | No |
Clients can provide credentials via:
Authorization: Bearer <secret>Authorization: <secret>(prefix added automatically)x-api-key: <secret>?key=<secret>query parameter
The /v1/models endpoint is public (no auth required).
Append :label to track usage without creating separate keys:
Authorization: Bearer sk-plexus-key:copilot
Authorization: Bearer sk-plexus-key:mobile:v2.5The part before the first colon authenticates; the rest is stored as attribution in usage logs. Query via:
SELECT attribution, COUNT(*), SUM(tokens_input + tokens_output)
FROM request_usage
WHERE api_key = 'key-name'
GROUP BY attribution;User quotas enforce per-key usage limits. Unlike provider quota checkers (which monitor upstream limits), these control client consumption.
| Type | Reset Behavior |
|---|---|
rolling |
Continuous window (e.g., "last hour") |
daily |
Resets at UTC midnight |
weekly |
Resets at UTC midnight Sunday |
monthly |
Resets at 00:00 UTC on the 1st |
| Type | What It Counts |
|---|---|
requests |
Number of API calls |
tokens |
Input + output + reasoning + cached tokens |
cost |
Dollar spending (requires pricing on models) |
Supported durations: 30s, 5m, 10m, 30m, 1h, 2h, 2h30m, 6h, 12h, 1d
Tokens/Requests (leaky bucket):
- After each request, usage is recorded.
- Before each request, usage "leaks" based on elapsed time:
leaked = elapsed × (limit / duration) - Remaining capacity determines if the request is allowed.
Cost (cumulative):
- Spending accumulates as requests complete.
- Resets when the window expires.
- No leak/refill within the window.
Reference a quota by name in the key's quota field. Keys without a quota have unlimited access.
When a provider returns errors, Plexus uses an escalating cooldown system to temporarily remove it from the routing pool.
| Consecutive Failures | Duration |
|---|---|
| 1st | 2 minutes |
| 2nd | 4 minutes |
| 3rd | 8 minutes |
| 4th | 16 minutes |
| 5th | 32 minutes |
| 6th | 64 minutes |
| 7th | 128 minutes |
| 8th | 256 minutes |
| 9th+ | 300 minutes (cap) |
- Successful requests reset failure count to 0.
413 Payload Too Largeerrors do NOT trigger cooldowns (client error).- Each provider+model combination tracks failures independently.
- Cooldowns persist in the database across restarts.
| Setting | Description | Default |
|---|---|---|
initialMinutes |
First failure duration | 2 |
maxMinutes |
Cap for exponential backoff | 300 |
Set disable_cooldown: true on a provider to exclude it from the cooldown system. Recommended for:
- Local model servers (Ollama, LM Studio)
- Providers with their own rate-limit handling
- Testing scenarios
Do not disable for cloud providers with unreliable endpoints.
GET /v0/management/cooldowns— list active cooldownsDELETE /v0/management/cooldowns— clear allDELETE /v0/management/cooldowns/:provider?model=:model— clear specific
See API Reference: Cooldown Management.
Plexus can abort upstream requests that run too long instead of waiting forever.
Timeouts matter for both reliability and cost:
- they stop abandoned or hung upstream requests from continuing to burn quota,
- they allow the dispatcher to fail over to another provider when appropriate,
- and they put a hard cap on runaway or extremely slow streams.
- Global default timeout:
300seconds - Per-provider override: optional via
timeoutMs - Effective timeout:
provider.timeoutMs ?? (global timeout.defaultSeconds × 1000)
If the timeout fires before the request completes:
- Plexus aborts the upstream fetch,
- the request is recorded with
responseStatus = "timeout", - and failover may continue to the next provider when the dispatcher is still in a retryable/failover-safe stage.
Global timeout settings live in system settings and are exposed through the Admin UI Config → Timeout Settings and the management API.
| Field | Type | Default | Range | Meaning |
|---|---|---|---|---|
defaultSeconds |
integer | 300 |
1–3600 |
Default maximum duration for any upstream request unless the selected provider overrides it |
| Field | Type | Default | Meaning |
|---|---|---|---|
timeoutMs |
integer (milliseconds) | inherit global | Override the global timeout for requests routed to this provider |
GET /v0/management/config/timeout— returns the effective global timeout configPATCH /v0/management/config/timeout— partial update of timeout settings
Example:
PATCH /v0/management/config/timeout
{
"defaultSeconds": 120
}- Start with the default
300sif you are unsure. - Lower it for providers that should either respond quickly or fail fast.
- Raise it only for providers that legitimately need long-running responses.
- Prefer a provider-specific override over increasing the global timeout for everyone.
Stall detection protects against providers that are technically connected but behaving too slowly to be useful.
Unlike a plain timeout, stall detection can distinguish between:
- a provider that never starts producing meaningful bytes (TTFB stall), and
- a provider that starts streaming but then slows below an acceptable throughput floor (throughput stall).
By default, stall detection is effectively off because both threshold dimensions are disabled:
ttfbSeconds = nullminBytesPerSecond = null
The supporting tuning values still have defaults even when detection is disabled:
ttfbBytes = 100windowSeconds = 10gracePeriodSeconds = 30
TTFB (time-to-first-bytes) detection checks whether the provider produces enough bytes quickly enough.
It uses two values together:
ttfbSeconds— how long Plexus waitsttfbBytes— how many bytes must arrive within that time to count as meaningful output
If the threshold is not met in time, Plexus treats the provider as stalled and aborts that attempt.
Important: when this happens before any response bytes reach the client, the dispatcher can transparently fail over to another provider.
Throughput detection applies after the provider has started responding.
It uses:
minBytesPerSecond— minimum acceptable streaming ratewindowSeconds— sliding window width for measuring throughputgracePeriodSeconds— delay after TTFB success before throughput enforcement starts
The grace period is especially important for reasoning-heavy models that may pause naturally after their first chunk.
If throughput drops below the configured floor, Plexus aborts the stream and records the request as stall.
Important: once bytes have already been sent to the client, Plexus cannot transparently fail over the same response stream. The client must retry.
Global stall settings live in the Admin UI under Config → Stall Detection and in the management API.
| Field | Type | Default | Range | Meaning |
|---|---|---|---|---|
ttfbSeconds |
integer or null |
null |
5–120 or null |
Max time to wait for the first meaningful bytes. null disables TTFB stall detection. |
ttfbBytes |
integer | 100 |
50–10000 |
Byte threshold that must arrive within ttfbSeconds to count as “started”. |
minBytesPerSecond |
integer or null |
null |
50–5000 or null |
Minimum acceptable streaming throughput. null disables throughput stall detection. |
windowSeconds |
integer | 10 |
3–30 |
Sliding window width used to calculate throughput. |
gracePeriodSeconds |
integer | 30 |
0–120 |
Delay after TTFB success before throughput enforcement starts. |
- Stall detection is enabled for a request if either
ttfbSecondsorminBytesPerSecondis active after global + provider override resolution. - Provider overrides take precedence over global settings.
- Per-provider overrides can enable stall detection even if the global stall config is disabled.
- Leaving provider fields empty keeps the global value for that field.
GET /v0/management/config/stall— returns the current global stall detection configPATCH /v0/management/config/stall— partial update of stall settings
Example:
PATCH /v0/management/config/stall
{
"ttfbSeconds": 15,
"ttfbBytes": 100,
"minBytesPerSecond": 500,
"windowSeconds": 10,
"gracePeriodSeconds": 30
}Disable one dimension while keeping the other:
PATCH /v0/management/config/stall
{
"ttfbSeconds": null,
"minBytesPerSecond": 400
}- Fail over slow starters only: set
ttfbSeconds, leaveminBytesPerSecondasnull - Protect long streams too: set both
ttfbSecondsandminBytesPerSecond - Reasoning-heavy models: keep a longer
gracePeriodSeconds - Bursty but healthy streams: increase
windowSecondsbefore raisingminBytesPerSecond
Separate from timeout/stall settings, Plexus now also cancels upstream provider requests when the downstream client disconnects during streaming. This reduces wasted quota for abandoned requests even when neither timeouts nor stall detection are enabled.
Plexus proxies Model Context Protocol servers. Only HTTP streaming transport is supported.
| Setting | Description | Required |
|---|---|---|
| Server Name | Identifier used in URLs | Yes |
| Upstream URL | Full MCP server endpoint | Yes |
| Enabled | Active for routing | No |
| Headers | Static headers forwarded to upstream | No |
| Method | Path | Description |
|---|---|---|
POST |
/mcp/:name |
JSON-RPC messages |
GET |
/mcp/:name |
SSE streaming |
DELETE |
/mcp/:name |
End session |
All MCP endpoints require a Plexus API key. Client auth headers are NOT forwarded — only configured static headers are added.
Plexus exposes standard OAuth 2.0 endpoints for MCP clients:
GET /.well-known/oauth-authorization-serverGET /.well-known/oauth-protected-resourceGET /.well-known/openid-configurationPOST /register
Vision fallthrough allows image inputs to be preprocessed by a vision-capable model before routing to the actual target.
Configuration:
- Set a global Descriptor Model in Settings (Admin UI)
- Enable Use Image Fallthrough on individual model aliases
Images are sent to the descriptor model first; the text analysis is prepended to the original prompt.
Configure pricing to enable cost selector strategy, cost-based quotas, and usage reporting.
| Source | Description |
|---|---|
simple |
Fixed per-million token rates |
openrouter |
Live rates from OpenRouter API |
defined |
Tiered rates based on input token volume |
per_request |
Flat fee per API call |
Configure via Admin UI or API:
input: dollars per million input tokens (e.g.,3.00)output: dollars per million output tokens (e.g.,15.00)cached: cache read rate (optional)cache_write: cache write rate (optional)
Fetches live rates from OpenRouter. Configure via Admin UI or API:
- Set source to
openrouter - Set model
slug(e.g.,anthropic/claude-3.5-sonnet) - Optional
discountfor percentage off all rates
Useful for providers with volume discounts. Configure tiers via Admin UI or API:
| Lower Bound | Upper Bound | Input Rate | Output Rate |
|---|---|---|---|
| 0 | 200,000 | $3.00/M | $15.00/M |
| 200,001 | ∞ (infinity) | $1.50/M | $7.50/M |
Flat fee regardless of token count. Configure via Admin UI or API:
- Set source to
per_request - Set
amount(e.g.,0.04)
Full cost stored in costInput; output/cached fields are zero.
Some providers (especially free-tier models) don't return usage data. Enable token estimation to automatically calculate token counts using a character-based heuristic.
Enable via:
- Provider setting:
estimateTokens: true - Admin UI: Advanced → Estimate Tokens
Estimated counts are flagged with tokensEstimated = 1 in usage records. Typical accuracy is within ±15% of actual values.
Plexus can encrypt sensitive data using AES-256-GCM.
| Data | Fields |
|---|---|
| API Keys | secret |
| OAuth Credentials | accessToken, refreshToken |
| Providers | apiKey, headers, quotaCheckerOptions |
| MCP Servers | headers |
# Generate a key
openssl rand -hex 32
# Set environment variable
export ENCRYPTION_KEY="your-64-character-hex-key"Key format accepts:
- 64-character hex string (32 bytes, used directly)
- Arbitrary passphrase (derived via scrypt)
- Existing plaintext data encrypts on first startup with key set.
- New data encrypts on write, decrypts on read.
- API key authentication uses SHA-256 hash lookups.
# Docker
docker exec -e ENCRYPTION_KEY="old" -e NEW_ENCRYPTION_KEY="new" plexus ./plexus rekey
# Binary
ENCRYPTION_KEY="old" NEW_ENCRYPTION_KEY="new" ./plexus rekeyAfter re-keying, update ENCRYPTION_KEY before restarting.
- Lost keys = unreachable data. Back up keys securely.
- Encrypted values prefixed with
enc:v1:in database. - Without
ENCRYPTION_KEY, all data stored plaintext.
Plexus automatically retries failed requests across alternative targets in multi-target model aliases.
All non-2xx status codes except 400 and 422 trigger failover. Custom retryable codes and errors can be configured.
enabled: Toggle failover on/offretryableStatusCodes: List of status codes that trigger retryretryableErrors: List of network errors that trigger retry