Spring Boot 3 RAG service using LangChain4j, Ollama (chat + embeddings), and H2 for vector storage.
- Java 17+
- Maven
- Ollama running locally (default
http://localhost:11434)- Pull models:
ollama pull llama2(ormistral),ollama pull nomic-embed-text
- Pull models:
mvn spring-boot:run
# or
mvn clean package && java -jar target/spring-rag-app-0.0.1-SNAPSHOT.jarH2 file DB is at ./data/ragdb (console at /h2-console, user sa, password password).
Edit src/main/resources/application.yml:
rag.ollama.base-url,rag.ollama.model(chat),rag.ollama.embedding-model- Chunking:
rag.chunking.max-tokens,rag.chunking.overlap - Retrieval:
rag.top-k
- POST
/api/documents(multipartfile): ingest PDF/TXT, auto-embeds chunks - GET
/api/documents: list docs - GET
/api/documents/{id}: doc metadata - GET
/api/documents/{id}/chunks: chunks - POST
/api/rag/query(JSON{question, topK}) : RAG answer with context
curl -X POST http://localhost:8080/api/documents \
-F "file=@/path/to/sample.txt"Response:
{
"documentId": "4b7e93c0-1c9f-4e9f-9b02-2e06d3e6f6c2",
"chunksIndexed": 3
}curl http://localhost:8080/api/documentsResponse:
[
{
"id": "4b7e93c0-1c9f-4e9f-9b02-2e06d3e6f6c2",
"title": "sample",
"originalFileName": "sample.txt",
"contentType": "text/plain",
"sizeBytes": 1200,
"createdAt": "2025-12-11T21:00:00Z",
"updatedAt": "2025-12-11T21:00:00Z",
"chunkCount": 3
}
]curl http://localhost:8080/api/documents/4b7e93c0-1c9f-4e9f-9b02-2e06d3e6f6c2/chunksResponse:
[
{
"id": "7d1f7a3e-5bf7-4b71-8a7d-65b42d5f8e93",
"chunkIndex": 0,
"text": "Chunk text ..."
},
{
"id": "b3e8949f-1e1d-4e62-8bb1-3b5c9a5f2b1e",
"chunkIndex": 1,
"text": "Chunk text ..."
}
]curl -X POST http://localhost:8080/api/rag/query \
-H "Content-Type: application/json" \
-d '{"question":"What does the sample document say?","topK":3}'Response:
{
"question": "What does the sample document say?",
"answer": "It explains the content of the sample document ...",
"context": [
"Chunk text ...",
"Another chunk ..."
]
}Use src/test/resources/http/rag-api.http and run requests against http://localhost:8080.
- Ingest sample text
- List docs, capture
id - Get chunks
- POST RAG query
- Ensure Ollama server is up and the embedding model exists (
ollama list). - If H2 lock issues, stop the app and delete
data/locally (DB will re-init). The folder is gitignored. - Logs set to warn for SQL; enable debug with
--debugif needed.