SLATEC-Modern Test Coverage

A Four-Level Testing Framework for Numerical Software, or: How to Find Where Things Went Wrong

The Four Levels

This testing framework is designed not merely to detect failures, but to locate them. When a deviation occurs, the level at which it manifests tells us where to look.

Level	Name	Question Answered	What We Change
1	Regression	Does the code work? Does it still work after changes?	✓ Unilateral changes permitted
2	Mathematical	Does the algorithm match the mathematics?	✗ Read-only
3	Historical	Does output match what IBM 360/370 users saw?	✗ Read-only
4	Hostile	Do compilers, processors, or OSes change anything?	✗ Read-only

Interpreting Failures

Level 1 fails → Code bug. Fix the code.
Level 2 fails → Algorithm doesn't match mathematics. Either the original was wrong, or the modernisation broke something. Investigate.
Level 3 fails → Deviation from historical baseline. Document in DEVIATIONS.md with explanation.
Level 4 fails → Portability issue. Compiler/platform-specific behaviour. Document and potentially ifdef.

The beauty of this structure is that a failure at Level N but success at Level N-1 immediately narrows the search space. If Level 2 passes but Level 3 fails, we know the mathematics is correct but something about the platform differs from the original IBM hardware.

Level 1: Regression Tests

"Does the code work? If we change things, does it still work?"

These are the tests we run continuously. They verify basic functionality and catch regressions when code is modified. This is the only level where we make unilateral changes.

Module	Status	Location
service	⏳ Pending	`test/level1_regression/`
special_functions	Partial (Bessel)	`test/test_bessel_regression.f90`
linear (BLAS)	✓ 18/18 PASS	`test/level1_regression/test_l1_linear_blas.f90`
linear (LINPACK)	⏳ In Progress	`test/level1_regression/test_l1_linear_linpack.f90`
diff_integ	✓ 12/12 PASS	`test/level1_regression/test_l1_diff_integ.f90`
diff_integ_eq	Partial	`test/test_diff_integ_eq.f90`
interpolation	✓ 9/9 PASS	`test/level1_regression/test_l1_interpolation.f90`
integ_trans	⏳ Pending	—
approximation (MINPACK)	✓ 9/9 PASS	`test/level1_regression/test_l1_minpack.f90`
nonlin_eq	⏳ Pending	—
optimisation	⏳ Pending	—
data_handling	⏳ Pending	—

Level 2: Mathematical Verification

"Does the algorithm match the original mathematics?"

These tests compare computed values against authoritative mathematical references. If Level 2 fails, either:

The original SLATEC algorithm was incorrect (rare but documented historically)
The modernisation introduced a mathematical error
The reference value is wrong (check multiple sources)

References:

Abramowitz & Stegun (1964) — Handbook of Mathematical Functions
NIST Digital Library of Mathematical Functions — https://dlmf.nist.gov/
Closed-form solutions where available (e.g. the integral of x squared from 0 to 1 equals one-third)

Module	Status	Reference Source
special_functions	Partial	A&S Tables 9.1, 9.8 (Bessel)
linear (BLAS)	✓ 16/16 PASS	Lawson et al., ACM TOMS 5(3), 1979
diff_integ	✓ 17/17 PASS	Closed-form integrals, A&S, NIST DLMF
interpolation	⚠ 14/17 PASS	de Boor (1978), Fritsch & Carlson (1980)
approximation (MINPACK)	✓ 17/17 PASS	Moré, Garbow, Hillstrom (1981)

Test Files:

test/test_bessel_golden.f90 — Bessel J, I, K vs A&S tables
test/test_bessel_reference.f90 — Extended validation
test/test_specfun_bessel.f90 — Edge cases
test/level2_mathematical/test_l2_linear_blas.f90 — BLAS linearity, orthogonality, Pythagorean properties
test/level2_mathematical/test_l2_minpack_mgh.f90 — MGH test functions (Rosenbrock, Powell, Helical Valley, Freudenstein-Roth)
test/level2_mathematical/test_l2_diff_integ.f90 — Gauss-Kronrod exactness, classical integrals, infinite intervals
test/level2_mathematical/test_l2_interpolation.f90 — B-spline, PCHIP, polynomial interpolation properties

Level 3: Historical Baseline (IBM 360/370)

"Does the output match what users would have seen on the original hardware?"

This is the "last known good" reference. SLATEC was developed and validated on IBM mainframes. If our modernised code produces different output than the original would have on IBM hardware, we need to understand why.

Test Environment

Component	Version	Notes
Compiler	IBM FORTRAN G Level 21 (IEYFORT)	1966 - predates SLATEC by 16 years
System	MVS 3.8j (TK4-)	The mainframe that refuses to die
Emulator	Hercules 3.07	Making virtual iron from actual silicon
Library	SLATEC 4.1	Sandia/Los Alamos/Air Force origin

Why This Matters

IBM hexadecimal floating-point differs from IEEE 754:

Property	IBM Hex FP	IEEE 754
Base	16	2
Single mantissa	24 bits (6 hex)	23 bits
Double mantissa	56 bits (14 hex)	52 bits
Gradual underflow	No	Yes
NaN / Infinity	No	Yes
Precision	"Wobbling" (0-3 bits lost)	Consistent

A deviation at Level 3 is not necessarily a bug - it may be an unavoidable consequence of the floating-point representation change. But it must be documented and explained.

Machine Constants Verification

IBM System/360 uses hexadecimal floating-point because of course it does.

I1MACH (Integer Machine Constants)

I1MACH(1)  = 5       Standard input unit
I1MACH(2)  = 6       Standard output unit
I1MACH(10) = 16      Base for floating-point (hex!)
I1MACH(11) = 6       Hex digits in single precision mantissa
I1MACH(14) = 14      Hex digits in double precision mantissa

R1MACH (Single Precision): epsilon = 9.54e-7, range 1e-78 to 1e76

D1MACH (Double Precision): epsilon = 2.22e-16, range 1e-79 to 1e75

Test Results Summary

Routine	Category	Precision	Status	Max Error
I1MACH	Machine Constants	Integer	PASS	exact
R1MACH	Machine Constants	Single	PASS	exact
D1MACH	Machine Constants	Double	PASS	exact
GAMLN	Special Functions	Single	PASS	~1e-6
DGAMLN	Special Functions	Double	PASS	~1e-15
PYTHAG	Numerical Utilities	Single	PASS	~1e-6
ENORM	Vector Operations	Single	PASS	0
DENORM	Vector Operations	Double	PASS	~1e-15
CDIV	Complex Arithmetic	Single	PASS	~1e-7
CSROOT	Complex Arithmetic	Single	PASS	0
RC	Elliptic Integrals	Single	PASS	~1e-6
DRC	Elliptic Integrals	Double	PASS	~1e-16
VNWRMS	Vector Operations	Single	PASS	~1e-6
HVNRM	Vector Operations	Single	PASS	0

14 routines tested. 14 routines passed. 0 surprises.

Highlights

Carlson Elliptic Integrals - Computing pi via the duplication theorem:

RC(0, 1/4) = 3.1415930 (single) - pi to 7 digits on 1966 hardware
DRC(0, 1/4) = 3.14159265358979 (double) - pi to 15 digits, error 2.4e-15

Overflow Protection - PYTHAG and ENORM handle extreme values correctly:

PYTHAG(1e30, 1e30) = sqrt(2)*1e30 without overflow
DENORM([1e30, 1e30]) = sqrt(2)*1e30 with error 0.0

Accuracy Notes

Single Precision (6 hex digits = 7.2 decimal digits)

Observed errors: 1e-5 to 1e-7
Within expected tolerance for IBM 360 hexadecimal FP

Double Precision (14 hex digits = 16.8 decimal digits)

Observed errors: 1e-15 to 1e-16
Approaching machine epsilon, as expected

FORTRAN IV Compatibility Fixes

Code modifications required to compile on IBM FORTRAN G (1966):

Issue	FORTRAN 77	FORTRAN IV Fix
Deck markers	`*DECK NAME`	`C DECK NAME`
Generic intrinsics	`MAX`, `MIN`, `ABS`	`AMAX1`, `AMIN1`, `ABS` (SP)
Generic intrinsics (DP)	`MAX`, `MIN`, `LOG`	`DMAX1`, `DMIN1`, `DLOG`
Assumed-size arrays	`X(*)`	`X(N)`
SAVE statement	`SAVE VAR`	(removed)
Hex constants	`Z'00100000'`	Decimal via EQUIVALENCE
Block IF	`IF (...) THEN`	`IF (...) GO TO label`

Test Location

/c/dev/fortran360/tests/slatec/

Coverage by Module

Module	Routines Tested	Status
service	D1MACH, R1MACH, I1MACH	PASS
special_functions	GAMLN, DGAMLN, CDIV, CSROOT	PASS
linear (BLAS)	DAXPY, DROTG	9/9 PASS
linear	ENORM, PYTHAG	PASS
elliptic	RC, DRC	PASS
approximation (MINPACK)	DENORM	7/7 PASS
approximation (MINPACK)	DQRFAC	Pending
approximation (MINPACK)	DNLS1	Pending
approximation (MINPACK)	DNSQ	Pending
interpolation	DPLINT, DPOLVL (polynomial)	4/4 PASS
interpolation	DPCHIM, DPCHFE (PCHIP)	N/A (post-1980)
interpolation	DBINT4, DBVALU (B-spline)	N/A (post-1978)
diff_integ	Trapezoidal rule	2/2 PASS
diff_integ	QUADPACK routines	N/A (post-1983)

Historical Context

SLATEC: 1982 joint effort by Sandia, Los Alamos, and Air Force labs
FORTRAN G: 1966 IBM compiler, predating SLATEC by 16 years
IBM 360: 1964 architecture, still running (virtually) today

The fact that 1982 library code runs perfectly on a 1966 compiler is a testament to FORTRAN's legendary backward compatibility.

BLAS tested with Pythagorean triple golden values. FORTRAN IV source in /c/dev/fortran360/tests/slatec/blas/. DENORM tested on IBM System/360 (Hercules/TK4-) using FORTRAN G, 1 January 2026.

Level 4: Hostile Tests

"Do compilers, processors, or operating systems change anything?"

These are the portability stress tests. The same code compiled with different compilers, on different processors, under different operating systems, should produce the same results (within floating-point tolerance). When it doesn't, we need to know.

Compiler Matrix

Compiler	Platform	Status
gfortran 13.x	Linux (x86_64)	CI tested
gfortran 13.x	macOS (ARM64)	CI tested
gfortran 13.x	Windows (MSYS2)	CI tested
ifort/ifx	Linux	⏳ Untested
flang	Linux	⏳ Untested
nvfortran	Linux	⏳ Untested

BLAS Level 4 Results

Test Category	Tests	Default	`-ffast-math`
Subnormals	3	✓ PASS	May FAIL (DAZ/FTZ)
Signed Zero	3	✓ PASS	✓ PASS
Inf/NaN Propagation	3	✓ PASS	✓ PASS
Extreme Values	4	✓ PASS	✓ PASS
Accumulation Precision	3	✓ PASS	✓ PASS
SIMD Edge Cases	4	✓ PASS	✓ PASS
Rounding Modes	4	✓ PASS	✓ PASS
FMA Detection	3	✓ PASS	✓ PASS
Catastrophic Cancellation	3	✓ PASS	May vary
Associativity	4	✓ PASS	May FAIL
Extended Precision (x87)	4	✓ PASS	✓ PASS
Reproducibility	3	✓ PASS	✓ PASS
Compiler Flag Detection	4	✓ PASS	FAIL (expected)
NaN Variants	4	✓ PASS	May FAIL
ULP Accuracy	4	✓ PASS	✓ PASS
Memory Alignment	12	✓ PASS	✓ PASS
Total	65	65/65 PASS	May vary

MINPACK Level 4 Results

Test Category	Tests	Default	`-ffast-math`
ULP Precision	4	✓ PASS	✓ PASS
Edge Cases	5	✓ PASS	May FAIL
Associativity	4	✓ PASS	May FAIL
Rounding Modes	4	✓ PASS	✓ PASS
FMA Effects	3	✓ PASS	✓ PASS
Catastrophic Cancellation	3	✓ PASS	May vary
Condition Number	3	✓ PASS	✓ PASS
Convergence Reproducibility	3	✓ PASS	✓ PASS
Compiler Flag Detection	4	✓ PASS	FAIL (expected)
Extended Precision (x87)	3	✓ PASS	✓ PASS
Scaling Sensitivity	3	✓ PASS	✓ PASS
Jacobian Computation	3	✓ PASS	May vary
LM Parameter Sensitivity	3	✓ PASS	✓ PASS
Total	45	45/45 PASS	May FAIL

Known Hostile Behaviours

Issue	Affected	Mitigation
`-ffast-math` FTZ	DENORM subnormals	Subnormals flushed to zero. DENORM returns 0 instead of correct norm. Do not use.
`-ffast-math`	Bessel K functions	2-4 ULP deviation. Do not use.
Aggressive optimisation	Some QUADPACK routines	May reorder operations. Test carefully.
ARM vs x86 FPU	Potentially all	Under investigation

See DEVIATIONS.md for full analysis.

Test Files

test/test_bessel_portability.f90 — Cross-platform ULP measurement
test/level4_hostile/test_l4_linear_blas.f90 — BLAS hostile tests (subnormals, Inf/NaN, SIMD edges)
test/level4_hostile/test_l4_minpack.f90 — MINPACK portability (detects FTZ)

Module Coverage Summary

Last updated: 18 January 2026

Module	Routines	L1	L2	L3	L4	Overall
service	3	—	—	✓	—	~33%
special_functions	270	~20	~20	4	~20	~9%
linear (BLAS)	217	18	16	9	65	~50%
diff_integ	81	12	17	2	12	~53%
diff_integ_eq	225	~5	—	—	—	~2%
interpolation	80	9	14	4	19	~58%
integ_trans	48	—	—	—	—	0%
approximation	78	9	17	7	45	~100%
nonlin_eq	15	—	—	—	—	0%
optimisation	46	—	—	—	—	0%
data_handling	16	—	—	—	—	0%
TOTAL	1,079	~83	~98	~28	~161	~25%

MINPACK Test Details

Testing the routines we just modernised (eliminated 27 GOTOs)

Complete 4-Level Results (1 January 2026)

Level	Tests	Result	Notes
L1 Regression	9	✓ 9/9 PASS	DENORM (7) + ENORM (2)
L2 Mathematical	17	✓ 17/17 PASS	Rosenbrock, Powell, Helical, F-R, DENORM
L3 Historical	7	✓ 7/7 PASS	IBM 360 golden values
L4 Hostile	45	✓ 45/45 PASS	13 categories, comprehensive portability
L4 with -ffast-math	45	May FAIL	Subnormal flush, associativity detected
TOTAL	78	78/78 PASS	(without hostile flags)

Test Files

Level	File	Description
L1	`test/level1_regression/test_l1_minpack.f90`	Modern Fortran regression
L2	`test/level2_mathematical/test_l2_minpack_mgh.f90`	MGH test functions
L3	`test/level3_historical/test_l3_minpack.f90`	IBM 360 golden comparisons
L3	`/c/dev/fortran360/tests/slatec/minpack/test_enorm.f`	FORTRAN IV for Hercules
L4	`test/level4_hostile/test_l4_minpack.f90`	Portability stress tests

IBM 360 Historical Tests (Level 3)

Test File	Routines	Problems	Status
test_enorm.f	DENORM	7 norm tests	✓ 7/7 PASS
test_qrfac.f	DQRFAC	4 QR tests	⏳ Pending
test_dnls1.f	DNLS1	Rosenbrock, Powell, Freudenstein-Roth	⏳ Pending
test_dnsq.f	DNSQ	Helical, Trigonometric, Broyden	⏳ Pending

BLAS Test Details

Testing core linear algebra routines (DAXPY, DSCAL, DCOPY, DSWAP, DROT, DROTG)

Complete 4-Level Results (1 January 2026)

Level	Tests	Result	Notes
L1 Regression	18	✓ 18/18 PASS	DAXPY, DSCAL, DCOPY, DSWAP, DROT, DROTG
L2 Mathematical	16	✓ 16/16 PASS	Linearity, orthogonality, Pythagorean
L3 Historical	9	✓ 9/9 PASS	IBM 360 golden values (Pythagorean triples)
L4 Hostile	65	✓ 65/65 PASS	16 categories, comprehensive portability
TOTAL	108	108/108 PASS

Test Files

Level	File	Description
L1	`test/level1_regression/test_l1_linear_blas.f90`	Modern Fortran regression
L2	`test/level2_mathematical/test_l2_linear_blas.f90`	Mathematical property verification
L3	`test/level3_historical/test_l3_linear_blas.f90`	IBM 360 golden comparisons
L3	`/c/dev/fortran360/tests/slatec/blas/test_daxpy.f`	FORTRAN IV for Hercules
L3	`/c/dev/fortran360/tests/slatec/blas/test_drotg.f`	FORTRAN IV for Hercules
L4	`test/level4_hostile/test_l4_linear_blas.f90`	Hostile environment tests

Mathematical Properties Tested (Level 2)

Property	Test
Distributivity	(a+b)X = aX + b*X
Distributivity	a*(X+Y) = aX + aY
Orthogonality	c² + s² = 1 for Givens rotations
Pythagorean	r = √(a² + b²) for Givens
Norm preservation	\|\|Rx\|\| = \|\|x\|\| for rotations
Triangle inequality	\|\|x+y\|\| ≤ \|\|x\|\| + \|\|y\|\|

Hostile Tests (Level 4) — 65 Tests in 16 Categories

Category	Tests	What It Catches
Subnormals	3	DAZ/FTZ from `-ffast-math`
Signed Zero	3	IEEE 754 ±0 handling
Inf/NaN	3	Special value propagation
Extreme Values	4	Near overflow/underflow
Accumulation	3	Precision loss in summations
SIMD Edges	4	Vectorization boundary bugs
Rounding Modes	4	All four IEEE rounding modes
FMA Detection	3	Fused multiply-add vs separate
Catastrophic Cancellation	3	Nearly-equal value subtraction
Associativity	4	Summation reordering, `-ffast-math`
Extended Precision (x87)	4	80-bit register leakage
Reproducibility	3	Identical results across runs
Compiler Flag Detection	4	Automatic dangerous flag detection
NaN Variants	4	qNaN vs sNaN, payload preservation
ULP Accuracy	4	Mathematical precision deviation
Memory Alignment	12	Non-aligned, strided, reverse access

Diff_Integ Test Details

Testing QUADPACK and GAUS8 numerical integration routines

Complete 4-Level Results (18 January 2026)

Level	Tests	Result	Notes
L1 Regression	12	✓ 12/12 PASS	DGAUS8, DQAGS, DQAGI, DQNG
L2 Mathematical	17	✓ 17/17 PASS	Gauss-Kronrod exactness, classical integrals
L3 Historical	2	✓ 2/2 PASS	Trapezoidal rule (QUADPACK N/A - post-1983)
L4 Hostile	12	✓ 12/12 PASS	Extreme values, oscillatory, reproducibility
TOTAL	43	43/43 PASS

Test Files

Level	File	Description
L1	`test/level1_regression/test_l1_diff_integ.f90`	Basic integration regression tests
L2	`test/level2_mathematical/test_l2_diff_integ.f90`	Gauss-Kronrod exactness, special functions
L3	`test/level3_historical/test_l3_diff_integ.f90`	IBM 360 trapezoidal golden values
L4	`test/level4_hostile/test_l4_diff_integ.f90`	Hostile environment tests

Mathematical Properties Tested (Level 2)

Property	Test
Gauss-Kronrod exactness	QK15 exact for degree ≤ 29
Gauss-Kronrod exactness	QK21 exact for degree ≤ 41
Gauss-Kronrod exactness	QK31 exact for degree ≤ 61
Classical integrals	∫sin(x)dx, ∫exp(-x)dx, ∫1/√x dx
Infinite intervals	∫₀^∞ exp(-x)dx = 1
Gaussian integral	∫_{-∞}^{∞} exp(-x²)dx = √π

Hostile Tests (Level 4) — 12 Tests in 6 Categories

Category	Tests	What It Catches
Tiny Intervals	2	Very small [a,b], A≈B handling
Extreme Values	3	1e-100 to 1e100 function scaling
Narrow Peaks	2	Sharp Gaussian, narrow Lorentzian
Discontinuities	2	Step functions, kinks
Oscillatory	2	sin(10x), cos(10x) moderate oscillation
Reproducibility	1	Identical results across runs

Historical Note (Level 3)

QUADPACK was developed by Piessens, de Doncker, et al. and published in 1983. The algorithms post-date the IBM 360 era, so no authentic L3 golden values exist. Level 3 tests use trapezoidal rule integration (1960s compatible) for IBM 360 validation.

Interpolation Test Details

Testing B-spline, PCHIP, and polynomial interpolation routines

Complete 4-Level Results (18 January 2026)

Level	Tests	Result	Notes
L1 Regression	9	✓ 9/9 PASS	DBINT4, DBVALU, DPCHIM, DPCHFE, DPLINT, DPOLVL
L2 Mathematical	17	⚠ 14/17 PASS	B-spline issues identified
L3 Historical	4	✓ 4/4 PASS	IBM 360 polynomial golden values
L4 Hostile	19	✓ 19/19 PASS	Runge phenomenon, subnormals, cancellation
TOTAL	49	46/49 PASS	3 B-spline mathematical issues

Test Files

Level	File	Description
L1	`test/level1_regression/test_l1_interpolation.f90`	Basic interpolation regression
L2	`test/level2_mathematical/test_l2_interpolation.f90`	Mathematical property verification
L3	`test/level3_historical/test_l3_interpolation.f90`	IBM 360 golden comparisons
L3	`/c/dev/vintage/fortran360/tests/slatec/interpolation/test_poly.f`	FORTRAN IV for Hercules
L4	`test/level4_hostile/test_l4_interpolation.f90`	Hostile environment tests

IBM 360 Golden Values (Level 3)

Captured via Hercules/TK4- on 18 January 2026:

Test	y = x³ at midpoints	IBM 360 Value
p(0.5)	0.125	0.125000000000000D 00
p(1.5)	3.375	0.337500000000000D 01
p(2.5)	15.625	0.156250000000000D 02
p(3.5)	42.875	0.428750000000000D 02

Known Issues (Level 2)

DBINT4 natural spline: Does not interpolate exactly (max error: 3.56e-03)
DBINT4 clamped spline: Does not interpolate exactly (max error: 0.36)
Natural spline boundary: S''(1) = 60.1 instead of 0

PCHIP and polynomial interpolation work correctly.

Contributing Tests

When adding tests, consider which level they belong to:

Level 1: Basic functionality. Does DGEMM(A, B) return something reasonable?
Level 2: Mathematical correctness. Does DGEMM(I, A) return A?
Level 3: Historical baseline. Does our DGEMM match IBM 360 output?
Level 4: Portability. Does DGEMM give the same answer on ARM vs x86?

Only Level 1 tests should be modified when code changes. Levels 2-4 are reference tests — if they fail, investigate before changing them.

Deviations

All deviations from expected values are documented in DEVIATIONS.md, including:

Which level detected the deviation
Root cause analysis
Whether it's a bug, a floating-point artefact, or expected behaviour
Remediation (if any)

"The purpose of testing is not to prove the code works. It's to find where it doesn't."

— Someone who has clearly spent too long debugging numerical software

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

SLATEC-Modern Test Coverage

The Four Levels

Interpreting Failures

Level 1: Regression Tests

Level 2: Mathematical Verification

Level 3: Historical Baseline (IBM 360/370)

Test Environment

Why This Matters

Machine Constants Verification

Test Results Summary

Highlights

Accuracy Notes

FORTRAN IV Compatibility Fixes

Test Location

Coverage by Module

Historical Context

Level 4: Hostile Tests

Compiler Matrix

BLAS Level 4 Results

MINPACK Level 4 Results

Known Hostile Behaviours

Test Files

Module Coverage Summary

MINPACK Test Details

Complete 4-Level Results (1 January 2026)

Test Files

IBM 360 Historical Tests (Level 3)

BLAS Test Details

Complete 4-Level Results (1 January 2026)

Test Files

Mathematical Properties Tested (Level 2)

Hostile Tests (Level 4) — 65 Tests in 16 Categories

Diff_Integ Test Details

Complete 4-Level Results (18 January 2026)

Test Files

Mathematical Properties Tested (Level 2)

Hostile Tests (Level 4) — 12 Tests in 6 Categories

Historical Note (Level 3)

Interpolation Test Details

Complete 4-Level Results (18 January 2026)

Test Files

IBM 360 Golden Values (Level 3)

Known Issues (Level 2)

Contributing Tests

Deviations