CaseCrawler can run fully offline, but it is designed to pull in stronger models and reference datasets when available.
casecrawler reference-datasets
casecrawler datasets capabilitiesRegistered reference sources include:
- Synthea FHIR/NDJSON/CSV imports
- Asclepius synthetic clinical notes
- SynthClinicalNotes
- augmented clinical notes
- MedSynth dialogue-note pairs
- BeTraC/Synth-DoPaCo doctor-patient transcript to SOAP rows
- clinical-note-to-FHIR references
- radiology report consistency references
- SynthCheX-75K-v2 image/text references
- synthetic chest X-ray pneumonia references
- Technetium-I de-identification and ICD-coding references
- bundled offline fixtures for smoke tests
casecrawler import-reference-dataset asclepius \
--dataset-id ds-asclepius-ref \
--limit 100
casecrawler import-reference-dataset \
--repo-id org/custom-synthetic-notes \
--dataset-id ds-custom-ref \
--note-field clinical_note \
--question-field prompt \
--answer-field completion \
--split eval \
--limit 100
casecrawler import-reference-dataset local-validation-notes \
--path ./validation/local-notes.jsonl \
--dataset-id ds-local-ref \
--note-field clinical_note \
--question-field prompt \
--answer-field completion \
--limit 100Local imports support JSONL, NDJSON, JSON arrays, and {"rows": [...]}.
casecrawler import-synthea ./synthea/output/fhir --dataset-id ds-synthea-ref
casecrawler run-synthea \
--synthea-executable ./synthea/run_synthea \
--output-dir ./synthea/output/fhir \
--dataset-id ds-synthea-ref \
--population 100Synthea imports accept FHIR JSON bundles, Bulk FHIR NDJSON resource directories, and Synthea CSV directories.
casecrawler imaging-modelsImaging profiles include Prompt2MedImage, MediSyn, CheXGenBench Sana, chest X-ray Stable Diffusion and DreamBooth profiles, symptom X-ray LoRA, and gated RoentGen profiles. Each profile exposes model id, license, gated status, use policy, prompt contract, output contract, and validation requirements.
Supported backends:
placeholder: deterministic local PNG for offline testsdiffusers: local Hugging Face/diffusers image generationexternal: command adapter returning anImagingAsset
casecrawler clinical-text-modelsClinical text adapters include deterministic templates, LLM provider routing, MedGemma, Meditron, and generic external command profiles. Noisy text variants can use standard, message, ocr, or heavy profiles.
casecrawler timeseries-modelsTime-series profiles include deterministic generation plus external-command profiles for TimeDiff, RawMed-style workflows, and MIRA-style generation or validation. External adapters return TimeSeriesChannel[].
Image/report validation includes deterministic lexical checks and optional BiomedCLIP or MedGemma image-text validators when the relevant extras and model access are available.