Discord assistant bot with provider-independent AI inference: support local models or remote APIs behind a single trait.
1. Candle (Hugging Face)
- Role: Minimalist ML framework for Rust; full inference stack (CPU, CUDA, Metal, WASM).
- Cross-platform: Linux, Windows, macOS (CPU; CUDA on Linux/Windows; Metal on macOS; Accelerate on Mac).
- Pros: Pure Rust core, many built-in models (LLaMA, Mistral, Phi, Gemma, quantized GGUF, etc.), safetensors/GGML, serverless-friendly, no Python.
- Cons: CUDA/Metal require feature flags and native toolchains.
- Relevance: Best choice for local inference in-process; use
candle-core+candle-transformers(or examples) for chat/completion. Optional features:cuda,metal,mkl,accelerate.
2. llama-cpp-rs (crates.io: llama_cpp_rs)
- Role: Rust bindings to llama.cpp (C++).
- Cross-platform: Follows llama.cpp (Linux, Windows, macOS; CUDA, Metal, OpenCL, OpenBLAS, BLIS via features).
- Pros: Mature GGML/GGUF ecosystem, widely used; good for GGUF models and existing llama.cpp deployments.
- Cons: FFI/build complexity, C++ dependency; crate may lag upstream (e.g. GGUF support noted as TODO in README). Prefer Candle for pure-Rust or if you want to avoid C++.
- Relevance: Alternative local inference backend when you already use llama.cpp or need a specific GGUF workflow.
- Note: A crate named “burnt-sienna” was not found on crates.io or GitHub. The Candle README references Kalosm as a “multi-modal meta-framework in Rust for interfacing with local pre-trained models” (controlled generation, samplers, vector DBs, audio, etc.). If “burnt-sienna” was meant as a codename or alternate name, Kalosm is the closest public meta-framework on top of local models in Rust.
- Unified trait: Define an
InferenceBackend(or similar) in a core crate: one method (e.g.complete(messages, options)) returning a sharedCompletionResponse. - Implementations:
- Local: Candle-based backend (load GGUF/safetensors, run in-process); optional llama-cpp-rs backend (spawn or link llama.cpp).
- Remote: HTTP client to OpenAI-compatible APIs (OpenAI, Azure, local servers like candle-vllm, Ollama, etc.) using the same request/response types.
- Config-driven backend selection: Choose backend at startup via config (e.g.
inference.backend = "candle" | "llama_cpp" | "openai"), so the Discord layer never depends on a specific provider.
- peimon-core: Shared types (
Message,CompletionRequest,CompletionResponse) and any shared traits/errors. - peimon-inference:
InferenceBackendtrait and backends (OpenAI-compatible HTTP; later Candle, optionally llama-cpp-rs). - peimon-discord: Serenity client, event handler, and command/response flow; calls into
peimon-inferenceonly via the trait. - peimon-bot: Binary that loads config, constructs the chosen backend, and runs the Discord client.
See ARCHITECTURE.md for crate layout and crate list.
| Crate | Purpose | Key dependencies |
|---|---|---|
| peimon-core | Types, traits, errors | serde, serde_json, thiserror, tracing |
| peimon-discord | Discord API & events | serenity (0.12), tokio |
| peimon-inference | Modular AI backends | tokio, optional reqwest; (future) candle / llama-cpp-rs |
| peimon-bot | Main binary | All of the above, tracing-subscriber |
- tokio: Async runtime used by Serenity and by inference (async HTTP, spawning blocking local inference). Use features:
macros,rt-multi-thread,sync; for the bot binary addfullor addfs,netas needed. - No separate “networking” crate; HTTP for remote APIs is via reqwest (optional in
peimon-inference).
- serenity 0.12: Gateway, HTTP, cache, model, framework, utils,
rustls_backend(cross-platform TLS). Use with tokio (e.g.macros,rt-multi-thread). - Optional: poise for slash commands and command framework on top of Serenity.
- Local (in-process): candle (candle-core + candle-transformers or examples) — recommended; optionally llama-cpp-rs for llama.cpp.
- Remote (HTTP): reqwest for OpenAI-compatible endpoints; shared types from
peimon-core. - Optional ecosystem: Kalosm for higher-level local model features; candle-vllm / atoma-infer for OpenAI-compatible serving.
cd peimon-v6-rust
cargo build
DISCORD_TOKEN=<your_token> cargo run -p peimon-botRun the full test suite across all workspace crates:
# From the repo root (peimon-v6-rust)
cargo test --workspace --all-targetsOr use the script:
./scripts/test-all.shTo run tests for a single crate:
cargo test -p peimon-core
cargo test -p peimon-inferenceinference.backend:openai|candle|llama_cpp(or similar).inference.openai.base_url,inference.openai.api_keyfor remote API.inference.candle.model_path(or similar) for local Candle backend.
MIT.