This repository is an SRAM surrogate and simulation codebase with a portability-focused analytical benchmark path.
It is positioned as a reproducible GPU validation and benchmarking portfolio asset, not as a hand-tuned CUDA kernel library. The core value is the separation of CPU, NumPy, and accelerator lanes with benchmark artifacts, environment metadata, and CPU-vs-accelerator fidelity checks.
For a 5-minute technical review, start here: PORTFOLIO_REVIEW.md.
Semiconductor simulation and SRAM reliability work often mix domain assumptions, generated collateral, CPU reference paths, and accelerator experiments. This repository turns an SRAM analytical surrogate workload into a reviewable GPU validation asset with separated execution lanes, standardized artifacts, and explicit claim boundaries.
- CPU baseline and NumPy reference lanes:
cpu_existing,cpu_numpy - CUDA-backed PyTorch accelerator lane: canonical
torch_accelerated - RTX 4060 Ti full-suite benchmark snapshot with environment metadata
- CPU-vs-accelerator fidelity checks with max/mean absolute-delta thresholds
- Standard artifacts:
metadata.json,results.csv,report.md,fidelity.md
Representative checked-in evidence:
| GPU | Suite | Workloads | Fidelity | Throughput | Speedup |
|---|---|---|---|---|---|
| RTX 4060 Ti 16GB | full | 10000x512, 5000x1024, 20000x512 |
max abs delta 2.958160e-08 |
185k-1.23M samples/s |
18.64x-138.25x vs cpu_existing |
Performance-engineering note: the chunk-size sweep in reports/portability/chunk_size_sweep_4060ti.md varies the PyTorch CUDA dataset chunk size on 20000x512; the local RTX 4060 Ti run shows chunk 2048 at 1.36x vs the default 1024 chunk for that sweep.
Read the benchmark numbers with two separate questions in mind:
- smoke rows prove artifact generation and numerical fidelity on a small case
- measured throughput rows are environment-specific performance snapshots, not a universal speedup claim
- the
2.12.0+cu126Torch value in the full snapshot was rechecked from the local CUDA benchmark environment viatorch.__version__,torch.version.cuda,torch.cuda.get_device_name(0), andpip show torch; seedocs/reproduce_cuda_4060ti.mdandreports/portability/cuda_full_environment.txt
Today it provides:
- reproducible CPU benchmark artifacts
- a canonical
torch_acceleratedlane that is currently CUDA-validated when a compatible PyTorch build is available - fidelity checks between CPU inference paths and the canonical accelerator lane
- isolation of accelerator-specific logic to reduce future ROCm/HIP porting cost
This is a GPU validation and benchmarking framework for an SRAM analytical surrogate, not a hand-written CUDA kernel library. Future portability boundaries are designed separately from measured CUDA evidence.
This benchmark currently uses an SRAM analytical surrogate. Physical credibility is tracked separately from GPU performance:
- Proxy-calibrated benchmark: current public state
- Perceptron-vs-SPICE validation: planned or conditional
- Silicon correlation: not claimed
See docs/pdk_validation_criteria.md for the PDK/SPICE/silicon validation gates.
- CPU analytical benchmark smoke runs through
python -m benchmarks.cli --suite smoke --device cpu - the compatibility wrapper
python scripts/run_gpu_analytical_benchmark.pystill works - analytical benchmark runs emit standard artifacts:
metadata.jsonresults.csvreport.mdfidelity.md
- fresh artifacts record
validation_scope,claim_level, and accelerator backend/runtime metadata - CPU existing vs CPU NumPy inference parity is checked automatically
- accelerator lanes degrade to
skippedorunsupportedinstead of crashing when an accelerator runtime is unavailable
- No AMD GPU or ROCm benchmark result is included
- No HIP port is implemented in this batch
- ROCm validation is pending AMD hardware access
native_backend.pysimulate/lifetime/optimize flows are not fully migrated into the new backend package yet- The checked-in CUDA snapshots do not claim universal GPU speedup across all workload sizes
Use conservative wording:
CPU benchmark artifacts are reproducible today, the
torch_acceleratedlane is currently CUDA-validated when a compatible PyTorch build is installed, and ROCm validation remains pending AMD hardware access.
CPU-only benchmark setup:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements-base.txt -r requirements-benchmark.txt
python -m benchmarks.cli --suite smoke --device cpuOptional CUDA benchmark setup:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements-base.txt -r requirements-benchmark.txt
# install the correct PyTorch build for your CUDA/runtime combination
python -m benchmarks.cli --suite smoke --device autoCreate an environment and install the benchmark stack:
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements-base.txt -r requirements-benchmark.txtIf you want the older umbrella install, requirements.txt still installs the base, benchmark, and UI dependency sets together.
python -m benchmarks.cli --suite smoke --device cpuCPU-only emulation with auto selection:
$env:SRAM_FORCE_CPU='1'
python -m benchmarks.cli --suite smoke
Remove-Item Env:SRAM_FORCE_CPUInstall a CUDA-capable PyTorch build for your platform first, then run:
python -m benchmarks.cli --suite smoke --device autoThe canonical accelerator lane recorded in fresh artifacts is torch_accelerated. Historical artifacts may still show the legacy alias gpu_pytorch.
python scripts/run_gpu_analytical_benchmark.pyAlternative module entrypoint:
python -m benchmarks.run_suite --suite smoke --device autoThis keeps the legacy CLI flags while also writing a standard artifact directory under artifacts/benchmarks/.
requirements-base.txt- NumPy and SciPy
requirements-benchmark.txt- scikit-learn
- PyTorch is documented as an optional manual install because the correct package depends on platform and accelerator runtime
requirements-ui.txt- Matplotlib, Streamlit, PySide6
requirements-dev.txt- base + benchmark development/test stack
backends/cpu_existing.pycpu_numpy.pytorch_portable.pyaccelerator_lane.pycuda_lane.pyregistry.py
benchmarks/- suite cases, environment capture, metrics, report writers, CLI
gpu_analytical_adapter.py- compatibility facade for earlier analytical helper imports
The main simulation and UI entry points remain in place:
main.pymain_advanced.pynative_backend.pystreamlit_app*.pypyside_sram_app_advanced.py
Each run writes a timestamped directory under artifacts/benchmarks/ containing:
metadata.jsonresults.csvreport.mdfidelity.md
New portability benchmark artifacts avoid absolute local filesystem paths.
Fresh artifacts use the canonical lane name torch_accelerated, while readers and dashboards still normalize the legacy gpu_pytorch alias from older snapshots.
Checked-in sanitized snapshots are available under reports/portability/:
reports/portability/cpu_smoke_report.mdreports/portability/cpu_smoke_fidelity.mdreports/portability/cuda_smoke_report.mdreports/portability/cuda_smoke_fidelity.mdreports/portability/cuda_full_report.mdreports/portability/cuda_full_fidelity.mdreports/portability/cuda_full_environment.txtreports/portability/cuda_full_metadata.jsonreports/portability/cuda_full_results.csvreports/portability/chunk_size_sweep_4060ti.mdreports/portability/chunk_size_sweep_4060ti.csvreports/portability/dashboard.md
Some generated benchmark artifacts may also include optional plots under plots/.
Minimal packaging metadata and console-script entrypoints are also defined in pyproject.toml.
Release-oriented portability automation is defined in .github/workflows/portability-release.yml.
Core simulation:
python main.py
python main_advanced.py
python hybrid_perceptron_sram.py
python adaptive_perceptron_sram.py
python reliability_model.py
python workload_model.pyUI:
pip install -r requirements-ui.txt
streamlit run streamlit_app.py
streamlit run streamlit_app_advanced.py
streamlit run streamlit_app_unified.py
python pyside_sram_app_advanced.pyValidation and report generation:
python spice_validation/run_spice_validation.py --spice-source placeholder
python scripts/run_pdk_matrix.py
python scripts/run_model_selection.py
python scripts/run_node_scaling.py
python scripts/build_research_evidence_pack.py
python scripts/export_research_bundle.py --tag public_snapshot --skip-zipdocs/benchmark_baseline_inventory.mddocs/benchmark_methodology.mddocs/backend_portability.mddocs/hip_porting_plan.mddocs/rocm_validation_matrix.mddocs/instinct_target_profile.mddocs/hipify_preflight_inventory.mddocs/rocm_manual_checklist.mddocs/limitations_and_claims.mddocs/reproduce_cuda_4060ti.mddocs/results_interpretation_guide.mddocs/portability_issue_backlog.mddocs/portability_release_checklist.mddocs/prd_completion_matrix.mddocs/native_backend_portability_inventory.mddocs/native_backend_rocm_migration_plan.mddocs/ci_future_rocm_runner_note.mddocs/pdk_validation_criteria.mddocs/open_source_reliability_roadmap_2026-03-09.mddocker/README.mdreports/portability/changelog.md