CaseCrawler is an open-source synthetic healthcare data platform for research, training-pipeline prototyping, benchmark construction, and release-package experimentation. It is not a medical device, and generated outputs require validation before any downstream clinical or operational use.
- Deterministic structured EHR records.
- Clinical notes with controlled variation.
- Schema, privacy, and clinical-consistency validation.
- Basic JSONL, FHIR NDJSON, parquet, and fine-tuning exports.
- Grounding metadata from public references and local fixtures.
- Public reference fixture import for benchmark comparisons.
- Benchmark profile scoring.
- Human review queues and review summaries.
- Train, validation, and test split packages.
- Dataset cards, model cards, manifests, checksums, and provenance.
- Objective coverage audits.
- Strict placeholder rejection for release-gated multimodal packages.
- Export transparency summaries that describe record origins and limitations.
- Clinician-reviewed golden cases.
- Specialty-specific coverage targets.
- Condition-specific clinical content packs.
- Contributor-maintained benchmark and validation fixtures.
- Real image backend contracts and policy metadata.
- Time-series model adapters.
- Image/report alignment validators.
- Optional external judge support.
- Public evaluation suites for generated release packages.
- No-key deterministic generation path for local development.
- Multimodal record model covering structured EHR, notes, labs, vitals, medications, allergies, orders, time series, and imaging metadata.
- Fine-tuning export profiles for SFT, chat, tool use, note-fact extraction, clinical observations, medication reconciliation, multimodal tasks, time-series tasks, DPO/RL, FHIR NDJSON, and parquet.
- Release-readiness gates with quality reports, audit artifacts, and human review workflow support.
- Pull-request CI tiers for backend, UI, and optional backend coverage.
- Convert hard-coded clinical profiles into contributor-friendly clinical content packs.
- Expand clinical coverage validation for condition-specific required artifacts.
- Grow public benchmark-suite fixtures and golden regression cases.
- Improve UI visibility into release readiness, human review status, and benchmark failures.
- Add more examples for open-source contributors building new conditions, model adapters, and export profiles.
- Real patient data ingestion for training datasets.
- Clinical decision support claims.
- Automated diagnosis, treatment recommendation, or patient-care workflows.
- Shipping generated data without provenance, validation, and review metadata.