A Streamlit application for automatically screening research records for diagnostics in antimicrobial resistance in sepsis using a keyword-based approach.
- Upload CSV files containing search results with columns such as "Title", "Abstract", and "Authors"
- Automatically screen records using keywords from AMR, diagnostic, and sepsis categories
- Calculate scores based on keyword matches in both title and abstract fields
- Classify records as "Include", "Exclude", or "Maybe" based on configurable thresholds
- Manually review and override classifications
- Export final screening results to CSV
-
Clone this repository:
git clone <repository-url> cd sepsis_screening
-
Install required dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run app.py
-
Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501)
-
Upload your CSV file containing research records (you can use the provided
sample_data.csv
for testing) -
Adjust the threshold settings in the sidebar as needed
-
Review and manually classify records if necessary
-
Download the results using the provided link
A sample dataset (sample_data.csv
) is included in this repository for testing purposes. It contains 12 mock research records related to sepsis, antimicrobial resistance, and diagnostic methods. Each record includes title, abstract, authors, publication year, and journal information.
The application expects a CSV file with the following columns:
Title
: The title of the research paper/recordAbstract
: The abstract text of the research paper/record- Optional columns like
Authors
,Year
, andJournal
will also be displayed if present
The screening is based on three categories of keywords:
- AMR Keywords: Terms related to antimicrobial resistance
- Diagnostic Keywords: Terms related to diagnostic methods and tests
- Sepsis Keywords: Terms related to sepsis and bloodstream infections
You can view the specific keywords in each category by expanding the respective sections in the sidebar.
Each record receives a score based on the number of keyword matches found in both the title and abstract. Records are then classified as:
- Include: Records with scores at or above the high threshold
- Exclude: Records with scores below the low threshold
- Maybe: Records with scores between the two thresholds
The thresholds can be adjusted in the sidebar to fine-tune the screening process.