Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

gnusupport · 2025-03-01T09:53:35Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

After three months of research and experimentation, I've reached a point where implementing embeddings stored in a pgvector column in a PostgreSQL database has proven to be both effective and rewarding. The ability to search by embeddings alone is a game-changer for me. I find it indispensable, even without the processing power of an LLM.

From my understanding, RAG involves the LLM retrieving results from a database—be it files, database column entries, or links—and then generating answers in the context of those retrieved documents. This process is what constitutes a RAG, and I believe I have nearly achieved this in my setup.

What excites me now is the possibility of integrating this RAG functionality directly into the llama.cpp server. I envision it as an external function:

Users would have the option to enable the RAG feature on command line or in the web UI.
Users could customize the RAG template through UI options or command line inputs.
The external function could be defined via a command line parameter, such as:
```
llama-server --rag-function fetch-rag.sh
```
The external function fetch-rag.sh would receive the prompt via its standard input.
It would then return a list of RAG documents through standard output.
The exact implementation details of the fetching from external source is not the point; the concept is what's important. There are other ways of doing it, important is that user can decide how.
With the RAG option enabled, the LLM running on the llama-server would receive a new context based on the prompt. The new context is related to list of documents.
Subsequently, the llama-server would proceed with the final inference.
Users would receive a list of relevant documents and RAG results.

In my specific use case, I rely on database entries and customizable URIs, such as <a href="hyperscope:123">My document</a>, which open documents in an external program. Therefore, the RAG input cannot be limited to just files—it must be entirely customizable by the user.

I am eager to contribute US $100 to develop this option, enabling llama.cpp to support user-customizable RAG input. This would ensure that the final output is enriched with context derived from the documents and information gathered before the actual inference.

Motivation

I am motivated to enhance llama.cpp with user-customizable RAG functionality to leverage the full potential of context-aware retrieval and generation, thereby significantly improving the relevance and accuracy of its outputs.

Possible Implementation

See above

The text was updated successfully, but these errors were encountered:

arnfaldur · 2025-03-02T21:00:23Z

This seems fairly out of scope for this project. A RAG can be built using the /embeddings and /completions endpoints of the server for example. But either integrating a database or adding database interfaces for vector lookups seem like feature creep.

There are many other databases that do this better, and there are other tools that integrate LLM inference backends (like llama.cpp) with them to implement RAG.

If you need help getting this done I'd be willing to help but I don't think this is the right place.

gnusupport added the enhancement New feature or request label Mar 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

gnusupport commented Mar 1, 2025

arnfaldur commented Mar 2, 2025

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval #12129

Comments

gnusupport commented Mar 1, 2025

Prerequisites

Feature Description

Motivation

Possible Implementation

arnfaldur commented Mar 2, 2025