Skip to content

Performance: Complex{Num} coefficients and QTerm encoding slow products/powers vs QuantumAlgebra.jl #164

@oameye

Description

@oameye

A head-to-head benchmark (script: benchmark/quantumalgebra_comparison.jl) shows SecondQuantizedAlgebra (SQA) is faster than QuantumAlgebra (QA) on building/transforming Hamiltonians and on commutators, but slower on predominantly-bosonic products and powers (many-mode , Fock (a·a†)ⁿ, multi-spin Hⁿ).

Image

Representative numbers (Julia 1.12, SQA v0.5.1, QA v1.6.0; QA/SQA < 1 means QA faster):

benchmark SQA QA QA/SQA
JC build H 7.5 µs 30.4 µs 4.1×
nested [H,σ] depth 8 5.7 ms 17.9 ms 3.1×
JC Hⁿ n=4 1.08 ms 2.48 ms 2.3×
Fock (a·a†)¹⁰ 1.37 ms 0.25 ms 0.2×
many-mode M=16 12.5 ms 2.56 ms 0.2×
Dicke(3) H⁴ 23.5 ms 4.86 ms 0.2×

Root cause. SQA stores every prefactor as CNum = Complex{Symbolics.Num} and routes all coefficient arithmetic through Symbolics. Measured per-op cost: Num*Num ≈ 700 ns, Num(x) construction ≈ 168 ns, vs native Int/ComplexF64 ≈ 1 ns. The tax is in SymbolicUtils hashconsing (global cache insert + tree hash on every construction), confirmed by profiling and by BasicSymbolic*BasicSymbolic ≈ 701 ns being identical to Num*Num — i.e. it is not the Num wrapper (dropping it to raw BasicSymbolic would keep ~97% of the cost and add type instability). It is paid per term during product/power expansion, even for integer coefficients (Fock has integer coeffs yet still pays it, because Symbolics wraps numeric literals as hash-consed constants). QA avoids this entirely: its dict value is a native Number, and symbolic parameters live in the term key (QuTerm.params::Vector{Param}), so ω·J is a vector merge, not a CAS multiply.

Profiling attribution (share of active CPU, idle threads filtered):

coefficient (SymbolicUtils) operator machinery QTerm hashing
many-mode M=16 (symbolic coeffs) 70% 20% 10%
Fock (a·a†)¹⁰ (integer coeffs) 30% 50% 20%

Conclusion — two independent levers, neither sufficient alone:

  1. Native numeric coefficients — make CNum a single concrete struct holding a native number, escalating to Complex{Num} only when a free symbol is genuinely present (materialize back to Num only at the substitute/average/print boundaries). Stays type-stable (concrete struct + type-preserving arithmetic). Biggest lever for symbolic-coefficient workloads (the 70% above), projected ~5× → ~1.5× of QA on many-mode/Dicke/SW. Contained to cnum.jl. Does not fix Fock (only 30% coefficient there). To also speed genuinely-symbolic coeffs like QA, the symbolic part would need a lightweight monomial-of-named-params form (QA's model) with full Num as fallback.

  2. Compact operator/term encoding — even with coefficients free, SQA's operator-machinery floor still exceeds QA's total (many-mode ~3.8 ms vs QA 2.56 ms; Fock ~0.95 ms vs QA 0.25 ms). SQA stores terms as QTerm(ops::Vector{QSym}, ne) (heap vector of field structs, re-hashed per dict insert); QA uses near-isbits operators in a compact BaseOpProduct with an integer/Levi-Civita exchange. Cheaper/cached QTerm hashing, isbits operators, and fewer per-term allocations would lower this floor.

Related. Surfaced from #163 (benchmark against QuantumAlgebra). Lever 2 (compact term/operator encoding) overlaps existing optimization issues: #137 (hash-cons operator leaves), #141 (cache uses_phys_key as a QTerm field), #140 (SmallDict for short QAdd sums), #139 (optimize the ne / diagonal-split machinery).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions