Skip to content

Modernize: remove legacy AD, add Enzyme/ForwardDiff, require matrix observables_noise#152

Open
jlperla wants to merge 46 commits intomainfrom
modernize-remove-ad
Open

Modernize: remove legacy AD, add Enzyme/ForwardDiff, require matrix observables_noise#152
jlperla wants to merge 46 commits intomainfrom
modernize-remove-ad

Conversation

@jlperla
Copy link
Copy Markdown
Collaborator

@jlperla jlperla commented Mar 27, 2026

Summary

Major modernization of the package internals, tests, benchmarks, and documentation:

  • Replace Zygote/ChainRules AD with Enzyme.jl (reverse + forward mode) and ForwardDiff.jl for all problem types
  • Add QuadraticStateSpaceProblem and PrunedQuadraticStateSpaceProblem with full Enzyme AD support
  • Require observables_noise as AbstractMatrix (breaking change) — users must pass Diagonal(d) instead of a plain vector, eliminating ambiguity about variance vs standard deviation semantics
  • Add non-diagonal covariance matrix tests via vech parameterization for both DirectIteration and KalmanFilter, with Enzyme forward/reverse and ForwardDiff
  • Fix benchmark OOM by adding GC teardown between BenchmarkTools samples (Enzyme corrupts GC metadata, so GC is disabled during AD but briefly re-enabled between samples)
  • Fix all @test_broken reverse tests — replaced with manual autodiff + FD gradient comparison for model parameters, avoiding the write-first scratch buffer FD mismatch
  • Complete documentation rewrite following SciML standards with 15 doc pages covering tutorials, API reference, and advanced topics (Enzyme AD, ForwardDiff AD, StaticArrays, internals)

Breaking changes

  • observables_noise must now be an AbstractMatrix (e.g., Diagonal([0.01, 0.01]) or Symmetric(H * H')). Passing a Vector throws an error with a helpful message.

Key changes

Source

  • src/utilities.jl — vector observables_noise now errors instead of auto-wrapping with Diagonal
  • src/solve.jldefault_alg KalmanFilter dispatch restricted to RType <: AbstractMatrix
  • src/caches.jl — removed dead zero_sol!!/zero_cache!! functions (never called; solver fully overwrites all arrays)
  • New: src/algorithms/quadratic.jl, src/algorithms/generic.jl, src/workspace.jl, src/utilities_bangbang.jl

Tests

  • All tests updated: VectorDiagonal(...) for observables_noise
  • New non-diagonal R tests (vech parameterization) for DI Enzyme, DI ForwardDiff, Kalman Enzyme
  • All 6 former @test_broken reverse tests now pass as manual gradient checks
  • New test files: Kalman Enzyme/ForwardDiff, gradient comparison, cache reuse, StaticArrays, sensitivity interface

Benchmarks

  • GC teardown on all 32 Enzyme @benchmarkable calls prevents OOM during PkgBenchmark.judge()
  • New benchmark suites: quadratic, static_arrays, ensemble, ForwardDiff, gradient comparison
  • No performance regressions (verified via judge)

Documentation

  • 15 doc pages: getting started, 4 tutorials, 5 reference pages, 4 advanced guides
  • Fixed: problem_types table, KalmanFilter auto-selection conditions, zeroing claims, joint/marginal likelihood definitions, syms/obs_syms standardized to tuples, consistent noise parameterization

Test plan

  • Pkg.test() — all tests pass, zero broken, zero failures
  • docs/make.jl — all @example blocks compile, no errors
  • PkgBenchmark.judge() — all 10 suites complete, no regressions
  • CI passes on GitHub Actions

🤖 Generated with Claude Code

jlperla and others added 30 commits March 18, 2026 15:32
Strip all AD dependencies as a first step toward replacing with Enzyme.
All rrule definitions and AD-only utilities are commented out (not deleted)
using block comments so they can be restored as reference.

Source changes:
- Comment out ChainRulesCore import and rrule blocks in linear.jl,
  quadratic.jl, and AD-only helpers in utilities.jl
- Remove ChainRulesCore from Project.toml deps/compat
- Widen compat: RecursiveArrayTools 2.34/3/4, Julia >= 1.10
- Remove stale imports (Symmetric, ZeroMeanDiagNormal, rmul!)

Test changes:
- Remove ChainRulesCore, ChainRulesTestUtils, FiniteDiff, Zygote from
  test/Project.toml
- Comment out all gradient/test_rrule calls in test files
- Replace Zygote.Buffer with plain Vector in solve_manual test helper
- Comment out linear_gradients.jl include in runtests.jl

Benchmark changes:
- Archive old Zygote-based benchmarks in benchmark/old_zygote/
- Add primal-only benchmark script (benchmark/primal_benchmarks.jl)
- Remove Zygote from benchmark/Project.toml

All forward solvers and SciML interfaces are unchanged. Full test suite
passes including Aqua stale deps and ExplicitImports checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GenericStateSpaceProblem for callback-based nonlinear SSMs via
user-provided f!!/g!! with model parameters in p. Plugs into the
existing DirectIteration solver loop through _transition!!/_observation!!
dispatch with 0-based time indexing for callbacks.

- Remove QuadraticStateSpaceProblem (expressible via Generic callbacks)
- Remove AbstractPerturbationProblem (LinearSSP now subtypes AbstractSSP)
- Add NoiseSpec sentinel for generic noise dimension dispatch
- Remove UnPack dependency (replace @unpack with destructuring)
- Refactor solver into generic loop with !! pattern utilities,
  preallocated caches, and vector-of-vectors storage
- Add precompilation workloads for GenericStateSpaceProblem
- Migrate quadratic tests to generic callbacks (regression values preserved:
  RBC ≈ -690.81, FVGQ ≈ -1.47e7)
- All SciML interfaces work: remake, EnsembleProblem, plot, DataFrame

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add static/mutable consistency tests using !! callbacks matching
  differentiable_economics pattern (same f_lss!!/g_lss!! works for both
  Vector and SVector)
- Add function barrier in _solve_with_cache! to resolve NoiseSpec union
  type before entering hot loop
- Update benchmark with mutable vs static comparison
- solve! with pre-allocated cache: GenericSSP static matches LinearSSP
  static at ~2μs, 0 allocs in hot loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All benchmarks now use init/solve! pattern to measure solver loop
performance without cache allocation overhead. Results show:
- KalmanFilter and static paths: fully non-allocating (0 bytes)
- GenericSSP static == LinearSSP static (1.98μs, 0 GC)
- GenericSSP mutable == LinearSSP mutable (14.7μs)
- Remaining allocs in mutable DirectIteration are from solution
  object construction (ConstantInterpolation), not the solver loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add benchmark section using identical problem dimensions (N=5/30,
M=2/10, K=2/10, T=10/100) and data generation as
differentiable_economics/benchmark/lss.jl for direct comparison.

Results: solve! with pre-allocated cache shows zero loop overhead.
The single allocation per solve! is the SciML solution wrapper
(StateSpaceSolution + ConstantInterpolation), not the solver loop.
Mutable paths slightly faster than differentiable_economics
(no H*v observation noise step).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…de SciML deps

- Add obs_syms field to LinearStateSpaceProblem and StateSpaceProblem for
  named observation access (sol[:output], sol[:consumption])
- Add SymbolicIndexingInterface v0.3 as direct dep; use SymbolCache for
  state symbols (replacing deprecated syms kwarg on ODEFunction)
- Override Base.getindex(::StateSpaceSolution, ::Symbol) to dispatch on
  obs_syms first, then state symbols via variable_index
- Upgrade compat bounds to latest SciML ecosystem:
  SciMLBase 2, DiffEqBase 6, RecursiveArrayTools 3+,
  SymbolicIndexingInterface 0.3, StaticArrays 1
- Fix remake compatibility with SciMLBase v2.150 (accept f as kwarg so
  struct round-trip via _remake_internal works)
- Remove @inferred on constructors (SciMLBase v2 SymbolCache makes
  ODEFunction construction type-unstable; solve remains type-stable)
- Fix deprecated sol[::Int] indexing (RecursiveArrayTools v3)
- Add comprehensive tests: state/obs indexing, syms-only, obs_syms-only,
  obs_syms with no observations, symbolic indexing survives remake,
  backward compat without syms
- Precompile symbolic indexing paths
- Zero performance impact on solve loop (benchmarked)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract `_kalman_loglik!` and `_direct_iteration_loglik!` as Enzyme-compatible
  hot paths (no try/catch, no solution construction, no cache zeroing)
- `_direct_iteration_loglik!` takes H (obs noise matrix), computes R=H*H' via
  muladd!!, factors once with Cholesky, uses ldiv!! per timestep
- Add `alloc_direct_loglik_cache` with R, R_chol, innovation buffers
- Add EnzymeTestUtils forward+reverse tests (mutable + static, model Const + Duplicated)
- Add Enzyme AD benchmarks (hub-and-spoke pattern, DE-style wrapper functions)
- Adopt Julia 1.12 workspace pattern (test/ and benchmark/ as workspace projects)
- Remove old Zygote benchmarks and differentiable_economics references
- Regression tests match DE hardcoded values (loglik, filtered states)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Hoist M_obs * log(2π) out of Kalman loop (was recomputed every iteration)
- Hoist ismutable check in DI loglik (match Kalman pattern)
- Collapse identical raw_*_mutable!/raw_*_static! into single raw_*! functions
- Use get_observable() in DI loglik for matrix/vec-of-vecs consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove all AbstractMatrix dispatches for observables and noise (vec-of-vecs only)
- Remove MvNormal/Distribution construction from loglik computation
- Replace maybe_logpdf (MvNormal-based) with direct Cholesky-based loglik in
  _solve_direct_iteration! primal path
- Remove Distributions from Project.toml deps (was only used for MvNormal/logpdf)
- Fix default_alg dispatch: ObsType <: AbstractVector (was AbstractMatrix)
- Convert all test/benchmark data loading from matrix to vector-of-vectors
- Add logdet to LinearAlgebra imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enzyme's reverse-mode AD for mul!(Y, A, transpose(A)) dispatches to BLAS
syrk, whose adjoint generates a DSYMM call with invalid leading dimension
when A is rectangular (N≠K). This adds mul_aat!! which materializes the
transpose into a pre-allocated buffer to avoid the syrk path.

Upstream: EnzymeAD/Enzyme.jl#2355
Benchmarked: 0 regressions across 20 benchmarks (5% tolerance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables `PkgBenchmark.judge()` for automated before/after regression
checks. Remove commented-out run line from benchmarks.jl.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enzyme reverse-mode AD corrupts GC metadata under the repeated tight-loop
invocation that BenchmarkTools creates. Disabling GC for the benchmark suite
avoids the crash without distorting timings (allocation counts still tracked).
GC is re-enabled after suite definition for PkgBenchmark post-processing.

Upstream: EnzymeAD/Enzyme.jl#2355

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace `; C = C` with `; C`, `; noise = noise` with `; noise`, etc.
throughout src/ and test/ files. Also add GC.enable(false) around
benchmark suite to work around Enzyme reverse-mode GC segfault.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Inline _kalman_loglik! body into _solve_with_cache! for KalmanFilter
- Merge _direct_iteration_loglik! patterns into _solve_direct_iteration!
- Unify DirectIteration cache: loglik buffers (R, R_chol, innovation,
  innovation_solved) allocated in alloc_direct_cache when observables_noise
  is provided
- Remove try/catch in Kalman solve path (Enzyme can't differentiate through it)
- Pre-compute observation noise Cholesky once in cache for both loglik and
  simulation noise paths (no redundant cholesky calls)
- Remove stale logdet import
- Update kalman failure test to expect exception instead of -Inf retcode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rse patterns

- test_forward with Const return validates cache mutation tangents (u, P, z)
- test_reverse with Active return validates logpdf gradients via vech for posdef
- All array args Duplicated (Enzyme requires uniform activity in struct fields)
- Add sensitivity_interface.jl: minimal MWE for Enzyme through SciML-like solve
- Add enzyme_test_utils.jl: vech/unvech utilities for posdef parameterization
- Make LinearStateSpaceProblem mutable + add remake!() for Enzyme compatibility
- Remove test groups from runtests.jl (all tests run unconditionally)
- Replace generate_observations with package's own solve()
- Remove all manual finite differences (use EnzymeTestUtils exclusively)
- DI reverse marked @test_broken (marginal cache gradient FD mismatch, 74/75 pass)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Raw: primal solve!() through public API
- Forward: autodiff(Forward) with parameter perturbation, returns computed matrices
- Reverse: autodiff(Reverse, Active) with scalar logpdf, all Duplicated
- Use remake!() to update problem fields before solve!()
- Pre-allocate all shadow copies outside benchmark loop
- Small (N=5) and large (N=30) problem sizes for both Kalman and DirectIteration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New baseline for PkgBenchmark comparisons. Key Kalman timings:
- Raw: 0.86ms large (was 0.94ms, ~9% faster without try/catch)
- Forward: 3.2ms large (was 453ms — 141x faster, old had type instability)
- Reverse: 4.1ms large (was 4.4ms for comparable all-Duplicated config)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert LinearStateSpaceProblem to immutable struct (SciML convention)
- Remove custom remake!(), use SciMLBase's remake() in benchmarks
- Export remake from SciMLBase for user convenience
- Fix make_posdef_from_vech to return Matrix (not Symmetric) for Enzyme type stability
- Forward wrappers return solution struct (not nothing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… pattern)

Architecture now matches SciML ODE integrators:
- workspace.sol holds pre-allocated output arrays (u, P, z)
- workspace.cache holds scratch buffers only (innovation, gains, etc.)
- solve!() writes results into ws.sol and returns StateSpaceSolution

Also:
- Revert LinearStateSpaceProblem to immutable struct
- Remove custom remake!(), use SciMLBase's remake()
- Export remake from SciMLBase
- Fix make_posdef_from_vech to return Matrix (Enzyme type stability)
- Update all test/benchmark wrappers for (sol, cache) pair

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…! pattern

- Use ConcreteStructs @concrete for StateSpaceWorkspace (type-stable field
  reassignment without parametric types)
- logpdf is always Float64 (0.0 when no observables, not nothing) — ensures
  consistent types across init/solve! cycle
- solve!() returns StateSpaceSolution directly (type-stable return)
- Workspace has output (pre-allocated arrays) + cache (scratch) fields
- Update tests: logpdf === nothing → logpdf == 0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ::Float64 return annotations to scalar wrappers (prevents
  NonInferredActiveReturn)
- Forward wrappers return output arrays tuple (not solution struct)
  for proper tangent validation
- Forward uses Const return + vech for Kalman (Duplicated return
  triggers FD perturbation of sol/cache which hits solver bounds checks)
- sensitivity_interface.jl: MinimalProblem reverted to immutable,
  forward wrapper returns cache instead of nothing
- Improve DI reverse @test_broken comment explaining write-first
  scratch FD mismatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…expand benchmarks

Tests:
- Reorganize into algorithm-centric files: linear_direct_iteration.jl,
  kalman.jl, direct_iteration.jl with matching _enzyme.jl files
- Remove all Zygote/ChainRules references and commented-out AD blocks
- Add 7 new Enzyme edge-case tests (no obs, no noise, no C, impulse)
  covering all old Zygote AD test configurations
- Merge sciml_interfaces.jl (Linear + Generic), static_arrays.jl,
  cache_reuse.jl into cross-cutting files
- Clean up comment blocks, remove stale TODOs

Benchmarks:
- Add enzyme_linear_simulation.jl (simulation-only forward/reverse)
- Add enzyme_quadratic.jl (generic StateSpaceProblem primal)
- Add static_arrays.jl using init/solve! workspace (not high-level solve)
  with StaticArrays giving 6-19x speedup at small sizes
- Add ensemble.jl (manual multi-trajectory loop with Enzyme AD)
- Fix pre-existing bug: missing sol_out/dsol_out args in @benchmarkable
- Rename enzyme_direct_iteration.jl → enzyme_linear_likelihood.jl
- Delete old Zygote benchmarks (linear.jl, quadratic.jl)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each primal file now tests init()/solve!() for its key configurations:
- linear_direct_iteration.jl: 6 tests (sim, likelihood, no obs, no noise,
  no obs eq, repeated solve!)
- kalman.jl: 4 tests (basic 5x5, off-diagonal D, cov prior, repeated)
- direct_iteration.jl: 5 tests (generic linear, quadratic, no obs,
  no noise, repeated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation)

- enzyme_linear_simulation.jl: add no_noise and no_obs_eq raw benchmarks
- enzyme_quadratic.jl: rewrite with standard formulation (no closures),
  all matrices passed through p NamedTuple, bang-bang operators (mul!!,
  muladd!!, copyto!!). Enable forward+reverse Enzyme AD for both
  simulation and likelihood × small/large.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…state)

Quadratic callbacks use mul!!/muladd!!/copyto!! with ismutable branch
for the quadratic term (ntuple for SVector, scalar loop for mutable).
Mutable state u_f stored in Ref{} so it persists across callback
invocations through the immutable p NamedTuple.

Static 2x2: 0.33 μs vs mutable 2x2: 1.12 μs (3.4x speedup, 0 allocs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 6 Enzyme benchmark files were using the wrong pattern: passing prob
as Duplicated with remake(). The correct pattern (matching the working
tests) constructs the prob inside the wrapper from Duplicated arrays.

- Remove prob/dprob from all AD wrapper signatures and autodiff calls
- Inner wrappers construct LinearStateSpaceProblem/StateSpaceProblem
  locally from the Duplicated array arguments
- Use fill_zero!! (bang-bang) for matrix/vector shadow zeroing — works
  for both SMatrix and Matrix uniformly
- Add static AD benchmarks to static_arrays.jl (forward + reverse for
  linear 2x2, both static and mutable)

Static 2x2 AD results: forward 2.0μs (vs 5.4μs heap, 2.6x speedup),
reverse 6.6μs (vs 22.3μs heap, 3.4x speedup).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…blem

New dedicated types for second-order perturbation state-space models,
parallel to LinearStateSpaceProblem:

- QuadraticStateSpaceProblem: unpruned, quad(A_2, x) on state directly
- PrunedQuadraticStateSpaceProblem: pruned, quad(A_2, u_f) on linear-part
  state tracked in solver cache as Vector{typeof(u0)}

Both use @concrete structs, bang-bang operators for StaticArrays support,
and plug into the existing _solve_direct_iteration! loop via dispatch.
The pruned variant allocates u_f in the cache (not in p), following SciML
convention that p is user parameters and cache is solver workspace.

Includes:
- src: problem types, algorithm dispatches, cache allocation
- tests: primal + Enzyme forward/reverse for both variants
- benchmarks: raw/forward/reverse for unpruned+pruned, static comparisons

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite all documentation from scratch:
- Replace 3 monolithic example pages with 13 focused pages organized as
  Tutorials / Basics / Advanced following SciML conventions
- Add docstrings to all exported types (LinearStateSpaceProblem,
  QuadraticStateSpaceProblem, PrunedQuadraticStateSpaceProblem,
  StateSpaceProblem, DirectIteration, KalmanFilter, StateSpaceSolution)
- All @example blocks are executable and verified at build time
- Remove all Zygote references; Enzyme.jl is the sole AD backend
- Fix data format: all examples use Vector{Vector} for noise/observables
- Add executable Enzyme AD examples (DirectIteration, KalmanFilter,
  Optimization.jl MLE workflow with explicit gradient)
- Add FAQ, nonlinear callback example, remake docs, DataFrame conversion
- Add DocumenterInterLinks, checkdocs=:exports, Enzyme+Optimization deps
- Delete old docs/src/examples/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ForwardDiff works out of the box for small RBC-sized models (N≤5)
  via type promotion through the public solve() API
- Tests: kalman_forwarddiff.jl, linear_direct_iteration_forwarddiff.jl
  covering mutable and StaticArrays, gradient w.r.t. A/B/C/mu0/u0/H
- Apples-to-apples gradient comparison (gradient_comparison.jl):
  ForwardDiff vs Enzyme BatchDuplicated forward vs Enzyme reverse,
  all computing the same N² gradient components
- Benchmark results: ForwardDiff ≈ Enzyme reverse for small N;
  Enzyme reverse 230-500x faster at N=30;
  Enzyme BatchDuplicated forward slower than ForwardDiff (shadow overhead)
- Parallel ForwardDiff benchmark files for Kalman, DI likelihood, DI simulation
- Documentation: new ForwardDiff AD page with executable examples
  (joint likelihood, Kalman, multi-parameter, Optimization.jl, StaticArrays)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jlperla and others added 5 commits March 25, 2026 12:32
…ut BatchDuplicated

- Enzyme does not mutate Duplicated primals (A, B, C, etc. are read-only
  in the loglik functions), so copy() calls inside bench functions were
  pure allocation overhead distorting timings
- BatchDuplicated forward benchmarks commented out (kept for reference):
  always slower than ForwardDiff due to shadow-copy overhead for all
  arguments (sol, cache, etc.)
- Removed BatchDuplicated shadow pre-allocation from setup functions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enzyme reverse-mode AD corrupts GC metadata, so GC is disabled globally.
However, leaked memory accumulates across BenchmarkTools samples, eventually
triggering OOM (SIGKILL). Adding teardown=(GC.enable(true); GC.gc();
GC.enable(false)) to all Enzyme @benchmarkable calls reclaims memory
between samples — safe because Enzyme is not running during teardown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…non-diagonal R tests

Breaking change: observables_noise must now be an AbstractMatrix (e.g.,
Diagonal(d) or Symmetric(H * H')). Passing a Vector now throws an error
with a helpful message. This eliminates ambiguity about whether vector
entries are variances or standard deviations.

Source changes:
- utilities.jl: vector method now errors instead of auto-wrapping
- solve.jl: default_alg restricts RType to AbstractMatrix
- caches.jl: remove dead zero_sol!!/zero_cache!! (never called)
- precompilation.jl: use Diagonal for observables_noise
- Docstrings updated in state_space_problems.jl, solutions.jl

Test changes:
- All tests updated: Vector -> Diagonal(...) for observables_noise
- New non-diagonal R tests via vech parameterization (DI Enzyme, DI
  ForwardDiff, Kalman Enzyme) with forward and reverse mode
- enzyme_test_utils.jl: fix make_posdef_from_vech to avoid BLAS trmm!

Documentation (all 15 doc files):
- Fix problem_types table: split into common/linear-only/quadratic-only
- Remove false zeroing claims from internals.md and workspace.md
- Document full KalmanFilter auto-selection conditions in solvers.md
- Remove unnecessary DiffEqBase imports from 5 pages
- Remove misleading StateSpaceWorkspace re-import in enzyme_ad.md
- Add joint vs marginal likelihood explanation
- Add A_2 tensor explanation for quadratic models
- Add Quadratic/Generic model notes to Enzyme/ForwardDiff/StaticArrays pages
- Add workspace remake example for parameter sweeps
- Standardize syms/obs_syms to tuples throughout
- Consistent noise parameterization (Diagonal + Symmetric examples)
- Fix solution field types (logpdf: Real, retcode: always Success)
- Fix sol[2] comment, Val(K) in ForwardDiff StaticArrays
- Add partial differentiation FAQ entry
- Add StaticArrays ForwardDiff performance note
- Remove stale DifferentiableStateSpaceModels.jl link

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The EnzymeTestUtils.test_reverse perturbs ALL Duplicated arguments
including sol/cache workspace buffers. Since the solver overwrites these
before reading, FD sees zero gradient while Enzyme may report nonzero —
a false mismatch on write-first scratch buffers.

Replace all 6 @test_broken with manual autodiff + FD comparison that
verifies gradients for model parameters (A, H, u0, r_v, A_1) only.
This tests what users actually do: call autodiff and read parameter
shadows, ignoring workspace shadows.

Also add fdm_gradient helper to enzyme_test_utils.jl.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jlperla
Copy link
Copy Markdown
Collaborator Author

jlperla commented Mar 27, 2026

@ChrisRackauckas do a quick review and maybe apply AI patches as you see fit here.

Also, I haven't looked at the documentation carefully but will review it after this builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous formatting commit missed these directories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jlperla
Copy link
Copy Markdown
Collaborator Author

jlperla commented Mar 27, 2026

@ChrisRackauckas Runic and unit tests (outside of pre which I wouldn't expect to work) are now passing.

I do not have permissions to setup documenter deploy keys/etc. But if you can add in the documenter secret and deploy keys then I can check docs.

jlperla and others added 5 commits March 27, 2026 17:14
Add forward-mode and reverse-mode Enzyme AD benchmarks for the Kalman
filter with StaticArrays (3x3 and 5x5) alongside mutable counterparts.
Uses remake_zero! for Kalman cache shadows containing immutable SMatrix fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix _alloc_noise() to use alloc_like for AbstractMatrix (SVector cache buffers)
- Fix _add_observation_noise!! to use bang-bang pattern for SVector z elements
- Add get_concrete_noise dispatch for StaticMatrix (generates SVector noise)
- Add unit tests: static Kalman, pruned/unpruned quadratic, solve!/solve consistency
- Update StaticArrays docs with Kalman example and Enzyme AD note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ConditionalLikelihood: prediction error decomposition for fully-observed
state-space models (AR, VAR, nonlinear). Clamps state to observations at
each step and accumulates Gaussian log-likelihood. Works with all problem
types (Linear, Generic, Quadratic, PrunedQuadratic) and StaticArrays.

save_everystep=false: endpoints-only solve storing [u_initial, u_final]
with identical logpdf. Ping-pong 2-element buffers + 1-slot cache reduce
ForwardDiff allocations by 24-41x (mutable) and give up to 7x speedup
(StaticArrays KF). Supported by DirectIteration, ConditionalLikelihood,
KalmanFilter, and all quadratic variants.

Tests: core correctness, ForwardDiff (via FiniteDifferences.jl), Enzyme
forward (test_forward) and reverse (test_reverse) with EnzymeTestUtils,
StaticArrays, workspace reuse, edge cases. Benchmarks for Enzyme and
ForwardDiff. Documentation with examples and performance tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual autodiff + fdm_gradient with EnzymeTestUtils
test_forward and test_reverse across all Enzyme test files:
- linear_direct_iteration_enzyme.jl: 4 manual reverse → test_reverse
- quadratic_direct_iteration_enzyme.jl: 2 manual reverse → test_reverse
- kalman_enzyme.jl: already used test_reverse, added FiniteDifferences
- conditional_likelihood_enzyme.jl: uses test_forward + test_reverse
- conditional_likelihood_forwarddiff.jl: fdm_gradient → FiniteDifferences.grad

Remove fdm_gradient from enzyme_test_utils.jl (vech helpers retained).
All 1,860+ Enzyme checks pass at default tolerance (rtol=1e-9).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChrisRackauckas
Copy link
Copy Markdown
Member

Documenter deploy key and DOCUMENTER_KEY secret have been set up. Docs deployment should work now — feel free to re-trigger the docs CI job to verify.

jlperla and others added 4 commits March 30, 2026 14:07
…ro shadow

Restructure all Enzyme test wrappers to pass the full `prob` as a single
`Duplicated` argument instead of constructing it from 7+ separate matrix
args. This eliminates the need for `y` (observables) to be `Duplicated` —
observables get zero shadow automatically via `make_zero(prob)`.

Pattern: forward wrappers take (prob, sol, cache) → return output tuple;
reverse wrappers take (prob, sol, cache) → return ::Float64 scalar.
Vech tests keep separate args (remake doesn't work with Enzyme shadows).

GC guards added to all Enzyme test files to prevent Enzyme reverse-mode
GC corruption (#2355). max_range=1e-3 on fdm for prob-as-Duplicated tests
(FD perturbation of observables_noise can push non-positive-definite).

All tests pass at default tolerance (rtol=1e-9, atol=1e-9).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual for-loop data generation with LinearStateSpaceProblem
solve calls. Fixes @example block scoping errors in Documenter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename duplicate "Workspace API" header in CL tutorial to avoid slug
  conflict with basics/workspace.md
- Revert unnecessary @ref syntax changes in getting_started.md and
  enzyme_ad.md

Docs now build cleanly with zero errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gradient_comparison.jl uses manual Enzyme autodiff which fails on CI
due to Enzyme version mismatch (UndefVarError: byval). Move it inside
the CI != true guard with the other Enzyme tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants