I'm interested in why modern sequence architectures are designed the way they are — the inductive biases baked in by training on language, where they don't transfer, and whether cleaner designs can close the gap on other signal types.
My main project is PRISM: a hybrid linear-recurrent backbone interleaving Mamba-2-style SSD blocks (per-channel selective state, no mean-over-D_h collapse) with Gated Delta Rule blocks (matrix-valued associative memory) at a 3:1 ratio. The same backbone — identical hyperparameters, no modality-specific tuning — is applied to 12-lead ECG (PTB-XL, primary), spoken commands, and sequential images. Primary metric is macro one-vs-rest AUROC; the target is matching xresnet1d101 (~0.928) within bootstrap CI. A paper draft targeting ICML 2026 ES-FoMo IV / NeurIPS 2026 ENLSP-VI is in progress.
The reference implementation includes from-scratch SSD scan and chunked gated delta rule (UT transform / triangular solve), both with numerical-equivalence tests against torch.associative_scan and the FLA Triton kernels. 111 tests total: equivalence, float64 gradcheck, streaming state-passing, property-based.
Alongside the architecture work: Go/Rust microservices, observability tooling, and published PyPI packages — because the gap between "model that works in a notebook" and "model that ships" is a real engineering problem worth being good at.
Current focus:
- State-space models (Mamba-2 SSD, diagonal SSMs) and associative memory (delta rule, fast weights)
- Cross-modal portability — testing how far a single hybrid backbone generalizes without per-modality tuning
- Hardware-aware algorithm design: parallel scan, chunked recurrence, memory hierarchy
- Clinical time-series modeling: multi-label ECG classification, macro AUROC as primary metric
- Production ML infrastructure: training observability, checkpoint management, reproducible pipelines
PyTorch SSM Mamba-2 SSD Gated Delta Rule Research · Active — paper in progress
12-layer hybrid backbone: SSD blocks (Mamba-2-style, per-channel selective state) interleaved with Gated Delta Rule blocks at 3:1. From-scratch reference implementations of both — Hillis-Steele scan + UT-transform chunked delta rule — with numerical equivalence tests against torch.associative_scan and FLA Triton kernels. Same backbone applied to PTB-XL ECG (primary, multi-label macro AUROC), Speech Commands, and sequential CIFAR-10 with no per-modality hyperparameter changes. S4D-Complex is preserved as an ablation row; the default is SSD. Paper target: ICML 2026 ES-FoMo IV or NeurIPS 2026 ENLSP-VI.
Python FastAPI React Plotly PyPI · Published
LLM training loss spike flight recorder. FastAPI backend, React+Plotly frontend, Welford online z-score spike detection, Apache Arrow IPC transport. Published at pypi.org/project/trainscope/ — a small tool I wished existed while running PRISM training runs.
PyTorch Research · Study repo
Predecessor to PRISM. Single O(n) primitive combining local convolution, linear attention, gated fusion, and key-value memory. Kept public as a reference for the design choices that led to PRISM — including an honest README rewrite that removed fabricated metrics.
Go Rust TypeScript Docker · Production-grade
Full-stack monitoring and control platform for distributed services: Go backend (auth, metrics, alerting, SLO tracking, incident management), Rust agents for low-overhead host-side data collection, React/TypeScript frontend. ~70k lines across the stack. Built solo end-to-end — the exercise was shipping a production system, not just designing one.
| ML / Research |
|
| Systems |
|
| Web |
|
| Infra |
|



