Streamlit app for semantic search over a curated SciPy documentation and source-code corpus, with simple and advanced retrieval views across docs, code, and hybrid ranking.
Live demo: soumojitdalui-scipy-codebase-search-assistant-app-6r2myg.streamlit.app
This project explores retrieval over a large technical codebase by indexing both documentation chunks and source-code chunks, then exposing them through a searchable interface.
The app supports:
Simplemode for a cleaner product-style search experienceAdvancedmode for separateDocs,Code, andHybridretrieval views- hybrid ranking with reciprocal rank fusion
- result inspection at the chunk level
Large libraries like SciPy are difficult to navigate through keyword search alone. This project focuses on a more useful developer-search workflow:
- search for APIs, usage examples, and concepts in natural language
- retrieve grounded context from docs and source code
- compare documentation-only, code-only, and fused retrieval behavior
- app.py: Streamlit app
- requirements.txt: Python dependencies
- data: curated docs and code chunk corpus used by the app
- notebooks/rag_over_large_codebase.ipynb: notebook version of the project workflow
- artifacts: saved retrieval outputs
- Python
- Streamlit
- scikit-learn
- pandas
- NumPy
pip install -r requirements.txt
streamlit run app.py- The app uses a curated corpus derived from SciPy documentation and source code.
- The full vendored SciPy repository is not included in this GitHub-ready project copy.