Peimon v6 (Rust)

Discord assistant bot with provider-independent AI inference: support local models or remote APIs behind a single trait.

Research Summary: Rust LLM Frameworks (Cross-Platform)

1. Candle (Hugging Face)

Role: Minimalist ML framework for Rust; full inference stack (CPU, CUDA, Metal, WASM).
Cross-platform: Linux, Windows, macOS (CPU; CUDA on Linux/Windows; Metal on macOS; Accelerate on Mac).
Pros: Pure Rust core, many built-in models (LLaMA, Mistral, Phi, Gemma, quantized GGUF, etc.), safetensors/GGML, serverless-friendly, no Python.
Cons: CUDA/Metal require feature flags and native toolchains.
Relevance: Best choice for local inference in-process; use candle-core + candle-transformers (or examples) for chat/completion. Optional features: cuda, metal, mkl, accelerate.

2. llama-cpp-rs (crates.io: `llama_cpp_rs`)

Role: Rust bindings to llama.cpp (C++).
Cross-platform: Follows llama.cpp (Linux, Windows, macOS; CUDA, Metal, OpenCL, OpenBLAS, BLIS via features).
Pros: Mature GGML/GGUF ecosystem, widely used; good for GGUF models and existing llama.cpp deployments.
Cons: FFI/build complexity, C++ dependency; crate may lag upstream (e.g. GGUF support noted as TODO in README). Prefer Candle for pure-Rust or if you want to avoid C++.
Relevance: Alternative local inference backend when you already use llama.cpp or need a specific GGUF workflow.

3. “Burnt-sienna” / Kalosm

Note: A crate named “burnt-sienna” was not found on crates.io or GitHub. The Candle README references Kalosm as a “multi-modal meta-framework in Rust for interfacing with local pre-trained models” (controlled generation, samplers, vector DBs, audio, etc.). If “burnt-sienna” was meant as a codename or alternate name, Kalosm is the closest public meta-framework on top of local models in Rust.

4. Making the Assistant Provider-Independent

Unified trait: Define an InferenceBackend (or similar) in a core crate: one method (e.g. complete(messages, options)) returning a shared CompletionResponse.
Implementations:
- Local: Candle-based backend (load GGUF/safetensors, run in-process); optional llama-cpp-rs backend (spawn or link llama.cpp).
- Remote: HTTP client to OpenAI-compatible APIs (OpenAI, Azure, local servers like candle-vllm, Ollama, etc.) using the same request/response types.
Config-driven backend selection: Choose backend at startup via config (e.g. inference.backend = "candle" | "llama_cpp" | "openai"), so the Discord layer never depends on a specific provider.

Architecture (High Level)

peimon-core: Shared types (Message, CompletionRequest, CompletionResponse) and any shared traits/errors.
peimon-inference: InferenceBackend trait and backends (OpenAI-compatible HTTP; later Candle, optionally llama-cpp-rs).
peimon-discord: Serenity client, event handler, and command/response flow; calls into peimon-inference only via the trait.
peimon-bot: Binary that loads config, constructs the chosen backend, and runs the Discord client.

See ARCHITECTURE.md for crate layout and crate list.

Crates Overview

Crate	Purpose	Key dependencies
peimon-core	Types, traits, errors	`serde`, `serde_json`, `thiserror`, `tracing`
peimon-discord	Discord API & events	serenity (0.12), tokio
peimon-inference	Modular AI backends	tokio, optional reqwest; (future) candle / llama-cpp-rs
peimon-bot	Main binary	All of the above, tracing-subscriber

Networking (Tokio)

tokio: Async runtime used by Serenity and by inference (async HTTP, spawning blocking local inference). Use features: macros, rt-multi-thread, sync; for the bot binary add full or add fs, net as needed.
No separate “networking” crate; HTTP for remote APIs is via reqwest (optional in peimon-inference).

Discord (Serenity)

serenity 0.12: Gateway, HTTP, cache, model, framework, utils, rustls_backend (cross-platform TLS). Use with tokio (e.g. macros, rt-multi-thread).
Optional: poise for slash commands and command framework on top of Serenity.

AI Inference

Local (in-process): candle (candle-core + candle-transformers or examples) — recommended; optionally llama-cpp-rs for llama.cpp.
Remote (HTTP): reqwest for OpenAI-compatible endpoints; shared types from peimon-core.
Optional ecosystem: Kalosm for higher-level local model features; candle-vllm / atoma-infer for OpenAI-compatible serving.

Build & Run

cd peimon-v6-rust
cargo build
DISCORD_TOKEN=<your_token> cargo run -p peimon-bot

Testing

Run the full test suite across all workspace crates:

# From the repo root (peimon-v6-rust)
cargo test --workspace --all-targets

Or use the script:

./scripts/test-all.sh

To run tests for a single crate:

cargo test -p peimon-core
cargo test -p peimon-inference

Configuration (Planned)

inference.backend: openai | candle | llama_cpp (or similar).
inference.openai.base_url, inference.openai.api_key for remote API.
inference.candle.model_path (or similar) for local Candle backend.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
crates		crates
scripts		scripts
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peimon v6 (Rust)

Research Summary: Rust LLM Frameworks (Cross-Platform)

1. Candle (Hugging Face)

2. llama-cpp-rs (crates.io: `llama_cpp_rs`)

3. “Burnt-sienna” / Kalosm

4. Making the Assistant Provider-Independent

Architecture (High Level)

Crates Overview

Networking (Tokio)

Discord (Serenity)

AI Inference

Build & Run

Testing

Configuration (Planned)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Peimon v6 (Rust)

Research Summary: Rust LLM Frameworks (Cross-Platform)

1. Candle (Hugging Face)

2. llama-cpp-rs (crates.io: llama_cpp_rs)

3. “Burnt-sienna” / Kalosm

4. Making the Assistant Provider-Independent

Architecture (High Level)

Crates Overview

Networking (Tokio)

Discord (Serenity)

AI Inference

Build & Run

Testing

Configuration (Planned)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. llama-cpp-rs (crates.io: `llama_cpp_rs`)

Packages