A head-to-head benchmark (script: benchmark/quantumalgebra_comparison.jl) shows SecondQuantizedAlgebra (SQA) is faster than QuantumAlgebra (QA) on building/transforming Hamiltonians and on commutators, but slower on predominantly-bosonic products and powers (many-mode H², Fock (a·a†)ⁿ, multi-spin Hⁿ).
Representative numbers (Julia 1.12, SQA v0.5.1, QA v1.6.0; QA/SQA < 1 means QA faster):
| benchmark |
SQA |
QA |
QA/SQA |
| JC build H |
7.5 µs |
30.4 µs |
4.1× |
nested [H,σ] depth 8 |
5.7 ms |
17.9 ms |
3.1× |
JC Hⁿ n=4 |
1.08 ms |
2.48 ms |
2.3× |
Fock (a·a†)¹⁰ |
1.37 ms |
0.25 ms |
0.2× |
many-mode H² M=16 |
12.5 ms |
2.56 ms |
0.2× |
Dicke(3) H⁴ |
23.5 ms |
4.86 ms |
0.2× |
Root cause. SQA stores every prefactor as CNum = Complex{Symbolics.Num} and routes all coefficient arithmetic through Symbolics. Measured per-op cost: Num*Num ≈ 700 ns, Num(x) construction ≈ 168 ns, vs native Int/ComplexF64 ≈ 1 ns. The tax is in SymbolicUtils hashconsing (global cache insert + tree hash on every construction), confirmed by profiling and by BasicSymbolic*BasicSymbolic ≈ 701 ns being identical to Num*Num — i.e. it is not the Num wrapper (dropping it to raw BasicSymbolic would keep ~97% of the cost and add type instability). It is paid per term during product/power expansion, even for integer coefficients (Fock has integer coeffs yet still pays it, because Symbolics wraps numeric literals as hash-consed constants). QA avoids this entirely: its dict value is a native Number, and symbolic parameters live in the term key (QuTerm.params::Vector{Param}), so ω·J is a vector merge, not a CAS multiply.
Profiling attribution (share of active CPU, idle threads filtered):
|
coefficient (SymbolicUtils) |
operator machinery |
QTerm hashing |
many-mode H² M=16 (symbolic coeffs) |
70% |
20% |
10% |
Fock (a·a†)¹⁰ (integer coeffs) |
30% |
50% |
20% |
Conclusion — two independent levers, neither sufficient alone:
-
Native numeric coefficients — make CNum a single concrete struct holding a native number, escalating to Complex{Num} only when a free symbol is genuinely present (materialize back to Num only at the substitute/average/print boundaries). Stays type-stable (concrete struct + type-preserving arithmetic). Biggest lever for symbolic-coefficient workloads (the 70% above), projected ~5× → ~1.5× of QA on many-mode/Dicke/SW. Contained to cnum.jl. Does not fix Fock (only 30% coefficient there). To also speed genuinely-symbolic coeffs like QA, the symbolic part would need a lightweight monomial-of-named-params form (QA's model) with full Num as fallback.
-
Compact operator/term encoding — even with coefficients free, SQA's operator-machinery floor still exceeds QA's total (many-mode H² ~3.8 ms vs QA 2.56 ms; Fock ~0.95 ms vs QA 0.25 ms). SQA stores terms as QTerm(ops::Vector{QSym}, ne) (heap vector of field structs, re-hashed per dict insert); QA uses near-isbits operators in a compact BaseOpProduct with an integer/Levi-Civita exchange. Cheaper/cached QTerm hashing, isbits operators, and fewer per-term allocations would lower this floor.
Related. Surfaced from #163 (benchmark against QuantumAlgebra). Lever 2 (compact term/operator encoding) overlaps existing optimization issues: #137 (hash-cons operator leaves), #141 (cache uses_phys_key as a QTerm field), #140 (SmallDict for short QAdd sums), #139 (optimize the ne / diagonal-split machinery).
A head-to-head benchmark (script:
benchmark/quantumalgebra_comparison.jl) shows SecondQuantizedAlgebra (SQA) is faster than QuantumAlgebra (QA) on building/transforming Hamiltonians and on commutators, but slower on predominantly-bosonic products and powers (many-modeH², Fock(a·a†)ⁿ, multi-spinHⁿ).Representative numbers (Julia 1.12, SQA v0.5.1, QA v1.6.0;
QA/SQA< 1 means QA faster):[H,σ]depth 8Hⁿn=4(a·a†)¹⁰H²M=16H⁴Root cause. SQA stores every prefactor as
CNum = Complex{Symbolics.Num}and routes all coefficient arithmetic through Symbolics. Measured per-op cost:Num*Num≈ 700 ns,Num(x)construction ≈ 168 ns, vs nativeInt/ComplexF64≈ 1 ns. The tax is inSymbolicUtilshashconsing (global cache insert + tree hash on every construction), confirmed by profiling and byBasicSymbolic*BasicSymbolic≈ 701 ns being identical toNum*Num— i.e. it is not theNumwrapper (dropping it to rawBasicSymbolicwould keep ~97% of the cost and add type instability). It is paid per term during product/power expansion, even for integer coefficients (Fock has integer coeffs yet still pays it, because Symbolics wraps numeric literals as hash-consed constants). QA avoids this entirely: its dict value is a nativeNumber, and symbolic parameters live in the term key (QuTerm.params::Vector{Param}), soω·Jis a vector merge, not a CAS multiply.Profiling attribution (share of active CPU, idle threads filtered):
QTermhashingH²M=16 (symbolic coeffs)(a·a†)¹⁰(integer coeffs)Conclusion — two independent levers, neither sufficient alone:
Native numeric coefficients — make
CNuma single concrete struct holding a native number, escalating toComplex{Num}only when a free symbol is genuinely present (materialize back toNumonly at thesubstitute/average/print boundaries). Stays type-stable (concrete struct + type-preserving arithmetic). Biggest lever for symbolic-coefficient workloads (the 70% above), projected ~5× → ~1.5× of QA on many-mode/Dicke/SW. Contained tocnum.jl. Does not fix Fock (only 30% coefficient there). To also speed genuinely-symbolic coeffs like QA, the symbolic part would need a lightweight monomial-of-named-params form (QA's model) with fullNumas fallback.Compact operator/term encoding — even with coefficients free, SQA's operator-machinery floor still exceeds QA's total (many-mode
H²~3.8 ms vs QA 2.56 ms; Fock ~0.95 ms vs QA 0.25 ms). SQA stores terms asQTerm(ops::Vector{QSym}, ne)(heap vector of field structs, re-hashed per dict insert); QA uses near-isbits operators in a compactBaseOpProductwith an integer/Levi-Civita exchange. Cheaper/cachedQTermhashing, isbits operators, and fewer per-term allocations would lower this floor.Related. Surfaced from #163 (benchmark against QuantumAlgebra). Lever 2 (compact term/operator encoding) overlaps existing optimization issues: #137 (hash-cons operator leaves), #141 (cache
uses_phys_keyas aQTermfield), #140 (SmallDictfor shortQAddsums), #139 (optimize thene/ diagonal-split machinery).