Skip to content

PeimonBot/peimon-v6-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Peimon v6 (Rust)

Discord assistant bot with provider-independent AI inference: support local models or remote APIs behind a single trait.

Research Summary: Rust LLM Frameworks (Cross-Platform)

1. Candle (Hugging Face)

  • Role: Minimalist ML framework for Rust; full inference stack (CPU, CUDA, Metal, WASM).
  • Cross-platform: Linux, Windows, macOS (CPU; CUDA on Linux/Windows; Metal on macOS; Accelerate on Mac).
  • Pros: Pure Rust core, many built-in models (LLaMA, Mistral, Phi, Gemma, quantized GGUF, etc.), safetensors/GGML, serverless-friendly, no Python.
  • Cons: CUDA/Metal require feature flags and native toolchains.
  • Relevance: Best choice for local inference in-process; use candle-core + candle-transformers (or examples) for chat/completion. Optional features: cuda, metal, mkl, accelerate.

2. llama-cpp-rs (crates.io: llama_cpp_rs)

  • Role: Rust bindings to llama.cpp (C++).
  • Cross-platform: Follows llama.cpp (Linux, Windows, macOS; CUDA, Metal, OpenCL, OpenBLAS, BLIS via features).
  • Pros: Mature GGML/GGUF ecosystem, widely used; good for GGUF models and existing llama.cpp deployments.
  • Cons: FFI/build complexity, C++ dependency; crate may lag upstream (e.g. GGUF support noted as TODO in README). Prefer Candle for pure-Rust or if you want to avoid C++.
  • Relevance: Alternative local inference backend when you already use llama.cpp or need a specific GGUF workflow.

3. “Burnt-sienna” / Kalosm

  • Note: A crate named “burnt-sienna” was not found on crates.io or GitHub. The Candle README references Kalosm as a “multi-modal meta-framework in Rust for interfacing with local pre-trained models” (controlled generation, samplers, vector DBs, audio, etc.). If “burnt-sienna” was meant as a codename or alternate name, Kalosm is the closest public meta-framework on top of local models in Rust.

4. Making the Assistant Provider-Independent

  • Unified trait: Define an InferenceBackend (or similar) in a core crate: one method (e.g. complete(messages, options)) returning a shared CompletionResponse.
  • Implementations:
    • Local: Candle-based backend (load GGUF/safetensors, run in-process); optional llama-cpp-rs backend (spawn or link llama.cpp).
    • Remote: HTTP client to OpenAI-compatible APIs (OpenAI, Azure, local servers like candle-vllm, Ollama, etc.) using the same request/response types.
  • Config-driven backend selection: Choose backend at startup via config (e.g. inference.backend = "candle" | "llama_cpp" | "openai"), so the Discord layer never depends on a specific provider.

Architecture (High Level)

  • peimon-core: Shared types (Message, CompletionRequest, CompletionResponse) and any shared traits/errors.
  • peimon-inference: InferenceBackend trait and backends (OpenAI-compatible HTTP; later Candle, optionally llama-cpp-rs).
  • peimon-discord: Serenity client, event handler, and command/response flow; calls into peimon-inference only via the trait.
  • peimon-bot: Binary that loads config, constructs the chosen backend, and runs the Discord client.

See ARCHITECTURE.md for crate layout and crate list.

Crates Overview

Crate Purpose Key dependencies
peimon-core Types, traits, errors serde, serde_json, thiserror, tracing
peimon-discord Discord API & events serenity (0.12), tokio
peimon-inference Modular AI backends tokio, optional reqwest; (future) candle / llama-cpp-rs
peimon-bot Main binary All of the above, tracing-subscriber

Networking (Tokio)

  • tokio: Async runtime used by Serenity and by inference (async HTTP, spawning blocking local inference). Use features: macros, rt-multi-thread, sync; for the bot binary add full or add fs, net as needed.
  • No separate “networking” crate; HTTP for remote APIs is via reqwest (optional in peimon-inference).

Discord (Serenity)

  • serenity 0.12: Gateway, HTTP, cache, model, framework, utils, rustls_backend (cross-platform TLS). Use with tokio (e.g. macros, rt-multi-thread).
  • Optional: poise for slash commands and command framework on top of Serenity.

AI Inference

  • Local (in-process): candle (candle-core + candle-transformers or examples) — recommended; optionally llama-cpp-rs for llama.cpp.
  • Remote (HTTP): reqwest for OpenAI-compatible endpoints; shared types from peimon-core.
  • Optional ecosystem: Kalosm for higher-level local model features; candle-vllm / atoma-infer for OpenAI-compatible serving.

Build & Run

cd peimon-v6-rust
cargo build
DISCORD_TOKEN=<your_token> cargo run -p peimon-bot

Testing

Run the full test suite across all workspace crates:

# From the repo root (peimon-v6-rust)
cargo test --workspace --all-targets

Or use the script:

./scripts/test-all.sh

To run tests for a single crate:

cargo test -p peimon-core
cargo test -p peimon-inference

Configuration (Planned)

  • inference.backend: openai | candle | llama_cpp (or similar).
  • inference.openai.base_url, inference.openai.api_key for remote API.
  • inference.candle.model_path (or similar) for local Candle backend.

License

MIT.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors