| Version | Supported |
|---|---|
| 0.1.x | ✅ |
If you discover a security vulnerability in phi-redactor, please report it responsibly:
- Do NOT create a public GitHub issue
- Email: security@phi-redactor.dev
- Include: description, reproduction steps, and impact assessment
- Expected response time: 48 hours
- PHI at rest: All original PHI is Fernet-encrypted (AES-128-CBC) in the SQLite vault
- PHI in transit: Never logged, never cached in plaintext, always redacted before forwarding
- Key management: Encryption keys stored in separate
.keyfiles with restricted permissions - Hash-based deduplication: Original values identified by SHA-256 hash, never by plaintext
phi-redactor ensures real PHI never reaches the LLM provider. All 18 HIPAA PHI identifier categories are detected and replaced with synthetic tokens before the request leaves your network. The token-to-original mapping is stored in a Fernet-encrypted local vault — the LLM provider has no access to it and cannot reverse the synthetic tokens to original PHI.
This is semantic pseudonymization with encrypted local token mapping. It is not Safe Harbor de-identification (which requires removal rather than replacement), but it provides a stronger privacy guarantee in the LLM use case: the cloud provider literally receives data about a fictional patient, not a redacted real one.
See Compliance Posture in the README for the full cryptographic breakdown and legal posture.
The 18 PHI identifier categories covered:
- Names
- Geographic data (smaller than state)
- Dates (except year) related to an individual
- Phone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers
- Full-face photographs
- Any other unique identifying number
- Tamper-evident hash chain (SHA-256 linked entries)
- Every redaction event logged with category, confidence, method, and action
- Chain integrity verification available via compliance reports
- Sessions expire after configurable idle/max lifetime
- Session data isolated by unique session ID
- Vault entries cascade-deleted when sessions expire
- No cross-session data leakage
- Proxy operates locally by default (127.0.0.1)
- TLS termination expected at reverse proxy layer for production
- API keys passed through to upstream providers, never stored
- CORS configurable for production deployment
- Run behind a TLS-terminating reverse proxy (nginx, Caddy)
- Restrict vault file permissions (
chmod 600) - Use dedicated encryption key paths, not defaults
- Enable audit trail and monitor compliance reports
- Set appropriate session lifetimes for your use case
- Review and rotate encryption keys periodically
- Back up vault database with encrypted backups only