The first step, common to both the Docker and the source code setup approaches, is to clone the repository and access it:
git clone https://github.com/AstraBert/ragcoon.git
Once there, you can choose one of the two following approaches:
Required: Docker and docker compose
-
Add the
groq_api_key
in the.env.example
file and modify the name of the file to.env
. Get this key:
mv .env.example .env
- Launch the Docker application:
# If you are on Linux/macOS
bash start_services.sh
# If you are on Windows
.\start_services.ps1
Or, if you prefer:
docker compose up qdrant -d
docker compose up frontend -d
docker compose up backend -d
You will see the frontend application running on http://localhost:8001
/ and you will be able to use it. Depending on your connection and on your hardware, the set up might take some time (up to 30 mins to set up) - but this is only for the first time your run it!
Required: Docker, docker compose and conda
- Add the
groq_api_key
in the.env.example
file and modify the name of the file to.env
in thescripts
folder. Get this key:
mv .env.example scripts/.env
- Set up RAGcoon using the dedicated script:
# For MacOs/Linux users
bash setup.sh
# For Windows users
.\setup.ps1
- Or you can do it manually, if you prefer:
docker compose up qdrant -d
conda env create -f environment.yml
conda activate ragcoon
- Now launch the frontend application
gunicorn frontend:me --bind 0.0.0.0:8001
- And then, from another terminal window, go into
scripts
and launch the backend:
uvicorn main:app --host 0.0.0.0 --port 8000
You will see the application running on http://localhost:8001
and you will be able to use it.
The main workflow is handled by a Query Agent, built upon the ReAct architecture. The agent exploits, as a base LLM, the latest reasoning model by Qwen, i.e. QwQ-32B
, provisioned by Groq.
- The question coming from the frontend (developed with Mesop - running on
http://localhost:8001
) is sent into a POST request to the FastAPI-managed API endpoint onhttp://localhost:8000/chat
. - When the Agent is prompted with the user's question, it tries to retrieved relevant context routing the query to one of three query engines:
- If the query is simple and specific, it goes for a direct hybrid retrieval, exploiting both a dense (
Alibaba-NLP/gte-modernbert-base
) and a sparse (Qdrant/bm25
) retriever - If the query is general and vague, it first creates an hypothetical document, which is embedded and used for retrieval
- If the query is complex and involves searching for nested information, the query is decomposed into several sub-queries, and the retrieval is performed for all of them, with a summarization step in the end
- If the query is simple and specific, it goes for a direct hybrid retrieval, exploiting both a dense (
- The agent evaluates the context using
llama-3.3-70B-versatile
, provisioned through Groq. If the context is deemed relevant, the Agent proceeds, otherwise it goes back to retrieval, trying a different method. - The agent produces a candidate answer
- The agent evaluates the faithfulness and relevancy of the candidate response, in light of the retrieved context, using LlamaIndex evaluation methods
- If the response is faithful and relevant, the agent returns the response, otherwise it gets back at generating a new one.
Contributions are always welcome! Follow the contributions guidelines reported here.
The software is provided under MIT license.