Feat/pubmed knowledge base integration#83
Conversation
|
@luca55466 could you please review before I remove the Draft. |
|
Hey @AchiTsa, Great work on this PR. Connecting the chatbot to PubMed and grounding responses with peer-reviewed abstracts is a meaningful upgrade — it shifts the whole thing from a rule-based Q&A tool to something that can actually cite evidence. The architecture is clean and fits naturally into the existing RAG pipeline. I ran this locally end-to-end: the fetcher pulls 30 abstracts across all 6 search terms from NCBI without issues, and the ingestion pipeline lands them correctly in ChromaDB as 43 Here's my review: Logic & Implementation
Suggestions 1. Hardcoded topic in the fetcher Right now every fetched article gets # In parse_pubmed_xml, add the term parameter:
def parse_pubmed_xml(xml_content: str, term: str) -> List[Dict[str, Any]]:
# Then in the metadata block, replace:
"topic": "Medical Literature"
# with:
"topic": term
# And update the call site in main():
articles = parse_pubmed_xml(xml_content, term)This would make retrieval meaningfully more precise at basically zero cost. 2. Null-pointer risk in The title and PMID extraction assumes those XML elements always exist, but pmid_el = article_tag.find(".//PMID")
title_el = article_tag.find(".//ArticleTitle")
if pmid_el is None or title_el is None:
logger.warning("Skipping article with missing PMID or title")
continue
pmid = pmid_el.text
title = title_el.text3. Version pins in Loosening Verdict: The feature does what it says and the core changes are solid. The one thing worth addressing before merging is the hardcoded topic metadata — real impact on retrieval quality, confirmed locally. Everything else is minor polish. Happy to see this merged once that's tidied up! |
Key Changes:
Data Acquisition
Enhanced RAG Pipeline
Documentation
Quality Assurance
Impact
The assistant can now retrieve and cite peer-reviewed medical research, shifting from a general-purpose chatbot to an evidence-based healthcare intelligence tool.
Closes #29