feat: HIPAA-native PHI redaction proxy for AI/LLM interactions#1
Merged
DilawarShafiq merged 11 commits intomainfrom Feb 26, 2026
Merged
feat: HIPAA-native PHI redaction proxy for AI/LLM interactions#1DilawarShafiq merged 11 commits intomainfrom
DilawarShafiq merged 11 commits intomainfrom
Conversation
Comprehensive feature specification for phi-redactor — an open-source, HIPAA-native PHI redaction proxy for AI/LLM interactions. Covers 9 user stories (P1-P3), 20 functional requirements, 10 success criteria, and quality validation checklist. Key differentiator: semantic masking with clinically coherent synthetic replacements vs opaque [REDACTED] tokens. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 0 research: Presidio+spaCy detection engine, FastAPI reverse proxy, LLM API contracts (OpenAI/Anthropic), HIPAA Safe Harbor compliance, Fernet encryption for vault, streaming re-hydration via sliding window buffer. Phase 1 design: 7-component architecture (Detection, Masking, Vault, Proxy, Audit, CLI, Plugins), 8-entity data model, OpenAPI 3.1 contract with 14 endpoints, quickstart guide, 6 key design decisions documented. Artifacts: plan.md, research.md, data-model.md, quickstart.md, contracts/openapi.yaml, 4 research PHRs, agent context update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Organized by 9 user stories with dependency graph: - Phase 1-2: Setup + Foundation (26 tasks) - Phase 3-5: P1 MVP — proxy, detection, masking (31 tasks) - Phase 6-8: P2 — vault, compliance, multi-provider (18 tasks) - Phase 9-11: P3 — FHIR, plugins, dashboard (16 tasks) - Phase 12: Polish (9 tasks) 40+ tasks parallelizable. MVP in 40 tasks. Launch in 57 tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 - Setup: - Project structure with all subpackages - pyproject.toml with dependencies and PyPI metadata - Dev tooling (ruff, mypy, pytest config) - Test fixtures and .env.example Phase 2 - Foundation: - models.py: 4 enums (18 HIPAA categories), 4 Pydantic models - config.py: PhiRedactorConfig with env/file/defaults, PHI-safe logging - detection/: PhiDetectionEngine wrapping Presidio + spaCy, HIPAA registry - masking/: SemanticMasker with Faker-based synthetic replacements - vault/: PhiVault with Fernet encryption, SQLite, session token maps - audit/: AuditTrail with hash-chain JSONL, integrity verification - 62 unit tests across detection, masking, vault, and audit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…T039 Phase 3 (US1 MVP): - proxy/app.py: FastAPI factory with lifespan, CORS, all routers - proxy/session.py: SessionManager with idle/max timeout cleanup - proxy/adapters/: BaseProviderAdapter ABC + OpenAIAdapter (SSE parsing) - proxy/routes/openai.py: /openai/v1/chat/completions + /embeddings - proxy/routes/management.py: /health, /stats, /sessions, /audit - proxy/routes/library.py: /redact and /rehydrate direct API - proxy/streaming.py: StreamRehydrator with sliding window buffer - cli/main.py: Click group with lazy-loading subcommands - cli/serve.py: phi-redactor serve with startup banner - cli/redact.py: phi-redactor redact for batch file processing Fail-safe: detection errors block requests (503) rather than leaking PHI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ider proxy, and compliance reports Critical fixes: - Fix vault-masker integration: SemanticMasker now correctly calls lookup_by_original(), store_mapping(), and new get_reverse_map() on PhiVault - Remove silent try/except that masked the broken vault calls - Fix FastAPI response_model=None for union return types in proxy routes - Replace example SSN 123-45-6789 with valid pattern 456-78-9012 in all tests New features: - 8 custom HIPAA recognizers (MRN, health plan, account, license, vehicle, device, biometric, fax) covering all 18 Safe Harbor identifier categories - Anthropic (Claude) provider adapter and proxy routes at /anthropic/v1/messages - HIPAA Safe Harbor compliance report generator with category coverage analysis, confidence distribution, hash-chain integrity verification, and compliance checks - CLI report command: phi-redactor report [--full] [--output file.json] - Compliance API endpoints: GET /api/v1/compliance/report and /compliance/summary - Vault helper methods: get_reverse_map, get_session_mappings, get_session_count Tests (156 passing): - Integration tests for OpenAI proxy round-trip with mocked upstream - Integration tests for multi-turn session consistency and rehydration - Unit tests for SSE streaming rehydrator buffering and token splitting - Unit tests for all 8 custom HIPAA recognizers - Detection benchmark covering all 18 HIPAA categories Project essentials: - README.md with architecture, quickstart, API reference, and badges - Apache-2.0 LICENSE file - GitHub Actions CI workflow (pytest + ruff + mypy across Python 3.11-3.13) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yption, isolation, masking quality, compliance, and Anthropic proxy Adds unit tests for IdentityClusterer and DateShifter, vault encryption at-rest tests, and integration tests for vault session isolation, masking pipeline quality, compliance report generation, and the Anthropic proxy route with mocked upstream. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Harbor report rendering - New cli/sessions.py: list/inspect/close/cleanup session commands backed by PhiVault - New cli/config.py: config show/providers commands for runtime config inspection - vault/store.py: add export_anonymized, purge_session, get_vault_stats methods - audit/reports.py: add generate_safe_harbor method + render_markdown/render_html functions - cli/report.py: add --format (json/md/html) and --safe-harbor options - proxy/routes/management.py: POST /api/v1/reports/safe-harbor endpoint - cli/main.py: register sessions and config in lazy_commands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1 marker — Phase 11-12 Phase 11 (Dashboard): - Add FastAPI dashboard router with live-stats API endpoint - Create self-contained HTML dashboard UI with auto-refresh - Mount dashboard routes in the proxy application factory Phase 12 (Polish): - Add PEP 561 py.typed marker for type checker support - Add SECURITY.md with vulnerability reporting policy and architecture docs - Add unit tests for dashboard module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Phase 9-10 Phase 9: FHIR resource reference recognizer (Patient/12345, FHIR URLs, OIDs) and HL7v2 segment recognizer (PID, NK1, GT1, IN1 segments) with entity mapping to OTHER_UNIQUE_ID category. Phase 10: Extensible plugin system with PluginLoader supporting module imports, directory scanning, and setuptools entry points. Includes example plugin, CLI plugin management commands, and full test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aint errors Adds ensure_session() to PhiVault that creates a session row if one doesn't exist, called automatically by store_mapping(). Also adds identity clustering, date shifting, and synthetic identity factory modules (Phase 5). Fixes proximity clustering test boundary detection. 256 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete implementation of phi-redactor — an open-source, HIPAA-native PHI redaction proxy that sits between your application and LLM providers (OpenAI, Anthropic), automatically detecting and masking all 18 Safe Harbor identifiers with clinically coherent synthetic replacements.
What's included
/openai/v1/chat/completions) and Anthropic (/anthropic/v1/messages)/dashboard/with auto-refreshing statsphi-redactor serve|redact|report|sessions|config|pluginscommandsArchitecture
Test Coverage
256 tests passing across unit and integration suites covering detection, masking, vault encryption, session isolation, streaming, proxy round-trips, compliance, clustering, date shifting, plugins, FHIR/HL7, and dashboard.
Test plan
🤖 Generated with Claude Code