A Retrieval-Augmented Generation (RAG) system that enables lab members to search and query scientific papers through Slack, synthesizing information from multiple papers with accurate citations.
Team: MSDS Capstone Team - 1
Duration: ~10 Weeks
Sponsor: Vitek Lab
Status: 🚧 In Development
This system will process 3,000-10,000 scientific papers and provide intelligent query capabilities through a Slack interface, deployed on NEU's Magi cluster infrastructure.
- Intelligent Paper Search: Query across thousands of research papers
- DOI-based Paper Addition: Add new papers in real-time using DOIs
- Multi-Library Support: Manage different paper collections
- Slack Integration: Native Slack bot for seamless lab communication
- Hallucination Monitoring: Quantifiable metrics for response accuracy
- Citation Tracking: Accurate source attribution for all responses
To be determined
uv pip install numpy pandas scikit-learn jupyter ipykernel
uv pip install -r requirements.txt
python -m ipykernel install --user --name=QSRR --display-name "Python-QSRR" To be determined
qsrr/
├── data/ # Data processing and storage
├── notebooks/ # Data processing and storage
├── models/ # Model configurations and weights
├── evaluation/ # Testing and metrics
├── app/ # Slack application code
├── deployment/ # CI/CD and infrastructure
└── docs/ # Documentation
- Open-source models only
- Medium-level reasoning capability
- Local deployment on NEU infrastructure
- Compute: Magi cluster (M2 Ultra Mac Studios)
- Resource Allocation: 10-15% cluster resources
- Users: 1-3 concurrent (10 total max)
- Volume: 3,000-10,000 scientific papers
- Formats: PDFs, web links, .bib metadata
| Phase | Duration | Focus |
|---|---|---|
| Phase 1: Data & Processing | Weeks 1-2 | PDF extraction, chunking, metadata tagging |
| Phase 2: Modeling | Weeks 3-5 | Embedding, generation, agent architecture |
| Phase 3: Evaluation | Weeks 6-7 | Metrics, testing, optimization |
| Phase 4: Deployment | Weeks 8-10 | Slack app, CI/CD, documentation |
- NDCG@k: Ranking quality
- F1 Score: Precision/recall balance
- ROUGE Scores: Summary accuracy
- RAGAS Faithfulness: Hallucination detection
- User Testing: Real-world query validation
To be determined
To be determined
- User Guide (coming soon)
- Technical Documentation (coming soon)
- API Reference (coming soon)
Guidelines to be established
To be determined
MSDS Capstone Team - 1
- Atyab Hakeem - [email protected]
- Naga Kushal Ageeru - [email protected]
- Kishan Sathish Babu - [email protected]
- Pranav Kanth Anbarasan - [email protected]
Sponsor: Vitek Lab
For questions about this project, please reach out to any team member listed above.
- Vitek Lab at Northeastern University
- MSDS Program
This project is part of the MSDS Capstone requirement at Northeastern University