Lethe AI Sharp is a modular, object‑oriented C# library that connects local or remote Large Language Model (LLM) backends to your applications (desktop tools, game engines, services). It also comes with its own light backend, allowing you to run a local LLM in the GGUF format directly without even having to rely on anything else.
It unifies: chat personas, conversation/session management, streaming inference, long‑term memory, RAG (retrieval augmented generation), background agentic tasks, web search tools, TTS, and structured output generation.
It is extensible, documented, and backend-agnostic (you write the same code no matter which backend is being used)
Pure .NET 8 C# implementation. No Python runtime, no conda environments, no pip hell.
Built-in LlamaSharp backend means you can distribute a single executable that runs LLMs locally. No external server required, but external servers are supported too.
- Game NPCs - Create dynamic, memory-enabled characters for NPC
- Chatbots - Build context-aware assistants with RAG
- Research Tools - Combine web search with LLM analysis
- Content Generation - Structured output for automation pipelines
// 1. Setup (choose backend style)
LLMEngine.Setup("http://localhost:1234", BackendAPI.OpenAI);
// 2. Connect
await LLMEngine.Connect();
if (LLMEngine.Status != SystemStatus.Ready)
throw new Exception("Backend not ready");
// 3. One-shot generation
var pb = LLMEngine.GetPromptBuilder();
pb.AddMessage(AuthorRole.SysPrompt, "You're an helpful and friendly bot!");
pb.AddMessage(AuthorRole.User, "Explain gravity in one friendly paragraph.");
var query = pb.PromptToQuery();
var reply = await LLMEngine.SimpleQuery(query);
Console.WriteLine(reply.Text);
// 4. Streaming variant (with cancellation)
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
LLMEngine.OnInferenceStreamed += (_, token) => Console.Write(token);
await LLMEngine.SimpleQueryStreaming(query, cts.Token);- Kobold API: Powerful text completion API, used by KoboldCpp.
- OpenAI API: Industry standard chat completion API, used by LM Studio, Text Generation WebUI, and many more.
Remote endpoints should work but primary focus remains local / LAN latency.
Alternatively, if running an external backend is too much, Lethe AI also comes with its internal "backend" to load local models (in the GGUF format) directly from your application. It uses LLamaSharp as a base, a C# port of LLama.cpp.
| Capability | Kobold API | OpenAI-Compatible | LlamaSharp (internal) |
|---|---|---|---|
| Basic text generation | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ |
| Structured output | ✅ GBNF Grammar | ✅ JSON Schema | ✅ GBNF Grammar |
| CoT / “thinking” models | ✅ | ✅ | ✅ |
| Personas & chat sessions | ✅ | ✅ | ✅ |
| RAG / Memory integration | ✅ | ✅ | ✅ |
| Web search integration | ✅ | ✅ | ✅ |
| Text To Speech | ✅ (if loaded) | ❌ | ❌ |
| VLM (image input)* | ✅ (if loaded) | ✅ | ❌ |
* VLM support depends entirely on underlying server and LLM capabilities. KoboldAPI has notoriously bad image input support.
- Persona system (bot & user role objects, custom prompts, instruction formats)
- Session-based chatlog with automated summarization
- LLM message streaming support
- Long‑term memory system + world info triggers
- RAG with vector search (HNSW) + embeddings
- Extensible background “agentic tasks” (search the web, summarization)
- Structured output (GBNF / JSON schema) for tool pipelines
- Web search integration (DuckDuckGo, Brave API)
- Useful LLM related tools (token counting, GBNF grammar, text manipulation helpers)
- Visual language model support
- Summaries of recent chat sessions into the system prompt
- Keyword-triggered text insertions (also known as "world info" in many frontends)
- Automatic and configurable insertion of relevant chat summaries into the context
- Customizable RAG system using the Small World implementation
- Customizable tasks can run in the background (while the user is AFK for instance)
- Includes 2 default tasks that run relevant web searches and mention results in following chat session
- Write your own tasks easily to boost your bot's abilities
- Group chat functionalities (one user and multiple AI characters)
- Sentiment analysis
To demonstrate how powerful Lethe AI can be, check out Lethe AI Chat. This is a powerful AI chat program for Windows that uses most of the features present in the library. It comes with its own integrated editors, extended agentic tasks, and extensive settings. It can rival with most of the dedicated AI chat programs currently available.
Right now, the best way to use the library is to add this repo as a submodule or project reference in your C# solution. NuGet package coming soon.
git submodule add https://github.com/SerialKicked/Lethe-AI-Sharp.gitPlace them into data/classifiers/ (configure their build action to “Copy if newer”):
| File | Purpose | Required? |
|---|---|---|
| gte-large.Q6_K.gguf | Embeddings for RAG & Memory similarity | Yes for everything memory or RAG related |
| emotion-bert-classifier.gguf | Sentiment / emotion (experimental) | No |
New users: Start with the Quick Start Guide to get running in 5 minutes!
For comprehensive documentation, check the Docs/ folder:
- LLM System Documentation - Core LLMEngine functionality, personas, and chat management
- Instruction Format Guide - Configuring message formatting for different models
- Personas - Create and customize personas
- Memory System - Understand the various memory systems and how they interact
- Examples - Working code samples and tutorials
Lethe AI Sharp relies on the following libraries and tools to work.
- LlamaSharp - Used as a backend-agnostic embedding system
- General Text Embedding - Large - Embedding model used as our default (works best in english)
- HNSW.NET - Used for everything related to RAG / Vector Search
- Newtonsoft Json - Practically all the classes can be imported and exported in Json
- OpenAI .Net API Library - Used for OpenAI API backend compatibility