Skip to content
View kaelvalen's full-sized avatar
😺
😺

Highlights

  • Pro

Block or report kaelvalen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kaelvalen/README.md
Kael Valen Banner

Mehmet Arda Hakbilen (Kael Valen)

Research engineer — efficient sequence architectures and the infrastructure they run on



What I'm Working On

I'm interested in why modern sequence architectures are designed the way they are — the inductive biases baked in by training on language, where they don't transfer, and whether cleaner designs can close the gap on other signal types.

My main project is PRISM: a hybrid linear-recurrent backbone interleaving Mamba-2-style SSD blocks (per-channel selective state, no mean-over-D_h collapse) with Gated Delta Rule blocks (matrix-valued associative memory) at a 3:1 ratio. The same backbone — identical hyperparameters, no modality-specific tuning — is applied to 12-lead ECG (PTB-XL, primary), spoken commands, and sequential images. Primary metric is macro one-vs-rest AUROC; the target is matching xresnet1d101 (~0.928) within bootstrap CI. A paper draft targeting ICML 2026 ES-FoMo IV / NeurIPS 2026 ENLSP-VI is in progress.

The reference implementation includes from-scratch SSD scan and chunked gated delta rule (UT transform / triangular solve), both with numerical-equivalence tests against torch.associative_scan and the FLA Triton kernels. 111 tests total: equivalence, float64 gradcheck, streaming state-passing, property-based.

Alongside the architecture work: Go/Rust microservices, observability tooling, and published PyPI packages — because the gap between "model that works in a notebook" and "model that ships" is a real engineering problem worth being good at.

Current focus:

  • State-space models (Mamba-2 SSD, diagonal SSMs) and associative memory (delta rule, fast weights)
  • Cross-modal portability — testing how far a single hybrid backbone generalizes without per-modality tuning
  • Hardware-aware algorithm design: parallel scan, chunked recurrence, memory hierarchy
  • Clinical time-series modeling: multi-label ECG classification, macro AUROC as primary metric
  • Production ML infrastructure: training observability, checkpoint management, reproducible pipelines

Projects

PyTorch SSM Mamba-2 SSD Gated Delta Rule Research · Active — paper in progress

12-layer hybrid backbone: SSD blocks (Mamba-2-style, per-channel selective state) interleaved with Gated Delta Rule blocks at 3:1. From-scratch reference implementations of both — Hillis-Steele scan + UT-transform chunked delta rule — with numerical equivalence tests against torch.associative_scan and FLA Triton kernels. Same backbone applied to PTB-XL ECG (primary, multi-label macro AUROC), Speech Commands, and sequential CIFAR-10 with no per-modality hyperparameter changes. S4D-Complex is preserved as an ablation row; the default is SSD. Paper target: ICML 2026 ES-FoMo IV or NeurIPS 2026 ENLSP-VI.

Python FastAPI React Plotly PyPI · Published

LLM training loss spike flight recorder. FastAPI backend, React+Plotly frontend, Welford online z-score spike detection, Apache Arrow IPC transport. Published at pypi.org/project/trainscope/ — a small tool I wished existed while running PRISM training runs.

PyTorch Research · Study repo

Predecessor to PRISM. Single O(n) primitive combining local convolution, linear attention, gated fusion, and key-value memory. Kept public as a reference for the design choices that led to PRISM — including an honest README rewrite that removed fabricated metrics.

Go Rust TypeScript Docker · Production-grade

Full-stack monitoring and control platform for distributed services: Go backend (auth, metrics, alerting, SLO tracking, incident management), Rust agents for low-overhead host-side data collection, React/TypeScript frontend. ~70k lines across the stack. Built solo end-to-end — the exercise was shipping a production system, not just designing one.


Tech Stack

ML / Research
Systems
Web
Infra

GitHub Activity

GitHub Metrics

Contribution Snake

Pinned Loading

  1. prism prism Public

    Modality-agnostic sequence model: S4D-Complex + Gated Delta Rule backbone for ECG, images, and continuous signals

    Python 1