SRAM GPU Portability And Benchmarking

This repository is an SRAM surrogate and simulation codebase with a portability-focused analytical benchmark path.

It is positioned as a reproducible GPU validation and benchmarking portfolio asset, not as a hand-tuned CUDA kernel library. The core value is the separation of CPU, NumPy, and accelerator lanes with benchmark artifacts, environment metadata, and CPU-vs-accelerator fidelity checks.

For a 5-minute technical review, start here: PORTFOLIO_REVIEW.md.

Why This Exists

Semiconductor simulation and SRAM reliability work often mix domain assumptions, generated collateral, CPU reference paths, and accelerator experiments. This repository turns an SRAM analytical surrogate workload into a reviewable GPU validation asset with separated execution lanes, standardized artifacts, and explicit claim boundaries.

What I Validated

CPU baseline and NumPy reference lanes: cpu_existing, cpu_numpy
CUDA-backed PyTorch accelerator lane: canonical torch_accelerated
RTX 4060 Ti full-suite benchmark snapshot with environment metadata
CPU-vs-accelerator fidelity checks with max/mean absolute-delta thresholds
Standard artifacts: metadata.json, results.csv, report.md, fidelity.md

Representative Results

Representative checked-in evidence:

GPU	Suite	Workloads	Fidelity	Throughput	Speedup
RTX 4060 Ti 16GB	full	`10000x512`, `5000x1024`, `20000x512`	max abs delta `2.958160e-08`	`185k-1.23M` samples/s	`18.64x-138.25x` vs `cpu_existing`

Performance-engineering note: the chunk-size sweep in reports/portability/chunk_size_sweep_4060ti.md varies the PyTorch CUDA dataset chunk size on 20000x512; the local RTX 4060 Ti run shows chunk 2048 at 1.36x vs the default 1024 chunk for that sweep.

Read the benchmark numbers with two separate questions in mind:

smoke rows prove artifact generation and numerical fidelity on a small case
measured throughput rows are environment-specific performance snapshots, not a universal speedup claim
the 2.12.0+cu126 Torch value in the full snapshot was rechecked from the local CUDA benchmark environment via torch.__version__, torch.version.cuda, torch.cuda.get_device_name(0), and pip show torch; see docs/reproduce_cuda_4060ti.md and reports/portability/cuda_full_environment.txt

Today it provides:

reproducible CPU benchmark artifacts
a canonical torch_accelerated lane that is currently CUDA-validated when a compatible PyTorch build is available
fidelity checks between CPU inference paths and the canonical accelerator lane
isolation of accelerator-specific logic to reduce future ROCm/HIP porting cost

Claim Boundary

This is a GPU validation and benchmarking framework for an SRAM analytical surrogate, not a hand-written CUDA kernel library. Future portability boundaries are designed separately from measured CUDA evidence.

Semiconductor-Domain Validation Roadmap

This benchmark currently uses an SRAM analytical surrogate. Physical credibility is tracked separately from GPU performance:

Proxy-calibrated benchmark: current public state
Perceptron-vs-SPICE validation: planned or conditional
Silicon correlation: not claimed

See docs/pdk_validation_criteria.md for the PDK/SPICE/silicon validation gates.

What Is Validated

CPU analytical benchmark smoke runs through python -m benchmarks.cli --suite smoke --device cpu
the compatibility wrapper python scripts/run_gpu_analytical_benchmark.py still works
analytical benchmark runs emit standard artifacts:
- metadata.json
- results.csv
- report.md
- fidelity.md
fresh artifacts record validation_scope, claim_level, and accelerator backend/runtime metadata
CPU existing vs CPU NumPy inference parity is checked automatically
accelerator lanes degrade to skipped or unsupported instead of crashing when an accelerator runtime is unavailable

What Is Not Claimed

No AMD GPU or ROCm benchmark result is included
No HIP port is implemented in this batch
ROCm validation is pending AMD hardware access
native_backend.py simulate/lifetime/optimize flows are not fully migrated into the new backend package yet
The checked-in CUDA snapshots do not claim universal GPU speedup across all workload sizes

Use conservative wording:

CPU benchmark artifacts are reproducible today, the torch_accelerated lane is currently CUDA-validated when a compatible PyTorch build is installed, and ROCm validation remains pending AMD hardware access.

Linux-First Quickstart

CPU-only benchmark setup:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements-base.txt -r requirements-benchmark.txt
python -m benchmarks.cli --suite smoke --device cpu

Optional CUDA benchmark setup:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements-base.txt -r requirements-benchmark.txt
# install the correct PyTorch build for your CUDA/runtime combination
python -m benchmarks.cli --suite smoke --device auto

Quick Start

Create an environment and install the benchmark stack:

python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements-base.txt -r requirements-benchmark.txt

If you want the older umbrella install, requirements.txt still installs the base, benchmark, and UI dependency sets together.

CPU-only benchmark smoke

python -m benchmarks.cli --suite smoke --device cpu

CPU-only emulation with auto selection:

$env:SRAM_FORCE_CPU='1'
python -m benchmarks.cli --suite smoke
Remove-Item Env:SRAM_FORCE_CPU

Optional CUDA benchmark smoke

Install a CUDA-capable PyTorch build for your platform first, then run:

python -m benchmarks.cli --suite smoke --device auto

The canonical accelerator lane recorded in fresh artifacts is torch_accelerated. Historical artifacts may still show the legacy alias gpu_pytorch.

Compatibility wrapper

python scripts/run_gpu_analytical_benchmark.py

Alternative module entrypoint:

python -m benchmarks.run_suite --suite smoke --device auto

This keeps the legacy CLI flags while also writing a standard artifact directory under artifacts/benchmarks/.

Dependency Layout

requirements-base.txt
- NumPy and SciPy
requirements-benchmark.txt
- scikit-learn
- PyTorch is documented as an optional manual install because the correct package depends on platform and accelerator runtime
requirements-ui.txt
- Matplotlib, Streamlit, PySide6
requirements-dev.txt
- base + benchmark development/test stack

Benchmark Architecture

backends/
- cpu_existing.py
- cpu_numpy.py
- torch_portable.py
- accelerator_lane.py
- cuda_lane.py
- registry.py
benchmarks/
- suite cases, environment capture, metrics, report writers, CLI
gpu_analytical_adapter.py
- compatibility facade for earlier analytical helper imports

The main simulation and UI entry points remain in place:

main.py
main_advanced.py
native_backend.py
streamlit_app*.py
pyside_sram_app_advanced.py

Standard Benchmark Artifacts

Each run writes a timestamped directory under artifacts/benchmarks/ containing:

metadata.json
results.csv
report.md
fidelity.md

New portability benchmark artifacts avoid absolute local filesystem paths. Fresh artifacts use the canonical lane name torch_accelerated, while readers and dashboards still normalize the legacy gpu_pytorch alias from older snapshots.

Representative Portability Snapshots

Checked-in sanitized snapshots are available under reports/portability/:

reports/portability/cpu_smoke_report.md
reports/portability/cpu_smoke_fidelity.md
reports/portability/cuda_smoke_report.md
reports/portability/cuda_smoke_fidelity.md
reports/portability/cuda_full_report.md
reports/portability/cuda_full_fidelity.md
reports/portability/cuda_full_environment.txt
reports/portability/cuda_full_metadata.json
reports/portability/cuda_full_results.csv
reports/portability/chunk_size_sweep_4060ti.md
reports/portability/chunk_size_sweep_4060ti.csv
reports/portability/dashboard.md

Some generated benchmark artifacts may also include optional plots under plots/.

Minimal packaging metadata and console-script entrypoints are also defined in pyproject.toml.

Release-oriented portability automation is defined in .github/workflows/portability-release.yml.

Other Entry Points

Core simulation:

python main.py
python main_advanced.py
python hybrid_perceptron_sram.py
python adaptive_perceptron_sram.py
python reliability_model.py
python workload_model.py

UI:

pip install -r requirements-ui.txt
streamlit run streamlit_app.py
streamlit run streamlit_app_advanced.py
streamlit run streamlit_app_unified.py
python pyside_sram_app_advanced.py

Validation and report generation:

python spice_validation/run_spice_validation.py --spice-source placeholder
python scripts/run_pdk_matrix.py
python scripts/run_model_selection.py
python scripts/run_node_scaling.py
python scripts/build_research_evidence_pack.py
python scripts/export_research_bundle.py --tag public_snapshot --skip-zip

Key Docs

docs/benchmark_baseline_inventory.md
docs/benchmark_methodology.md
docs/backend_portability.md
docs/hip_porting_plan.md
docs/rocm_validation_matrix.md
docs/instinct_target_profile.md
docs/hipify_preflight_inventory.md
docs/rocm_manual_checklist.md
docs/limitations_and_claims.md
docs/reproduce_cuda_4060ti.md
docs/results_interpretation_guide.md
docs/portability_issue_backlog.md
docs/portability_release_checklist.md
docs/prd_completion_matrix.md
docs/native_backend_portability_inventory.md
docs/native_backend_rocm_migration_plan.md
docs/ci_future_rocm_runner_note.md
docs/pdk_validation_criteria.md
docs/open_source_reliability_roadmap_2026-03-09.md
docker/README.md
reports/portability/changelog.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRAM GPU Portability And Benchmarking

Why This Exists

What I Validated

Representative Results

Claim Boundary

Semiconductor-Domain Validation Roadmap

What Is Validated

What Is Not Claimed

Linux-First Quickstart

Quick Start

CPU-only benchmark smoke

Optional CUDA benchmark smoke

Compatibility wrapper

Dependency Layout

Benchmark Architecture

Standard Benchmark Artifacts

Representative Portability Snapshots

Other Entry Points

Key Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
backends		backends
benchmarks		benchmarks
configs		configs
docker		docker
docs		docs
native		native
reports		reports
scripts		scripts
spice_validation		spice_validation
tests		tests
vendor/pdks		vendor/pdks
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
AMD_ROCm_HIP_Implementation_Plan.md		AMD_ROCm_HIP_Implementation_Plan.md
Dockerfile.portability		Dockerfile.portability
PORTFOLIO_REVIEW.md		PORTFOLIO_REVIEW.md
README.md		README.md
README_ADVANCED.md		README_ADVANCED.md
SRAM_GPU_Portability_PRD.md		SRAM_GPU_Portability_PRD.md
USAGE_UNIFIED.md		USAGE_UNIFIED.md
adaptive_perceptron_sram.py		adaptive_perceptron_sram.py
analytical_ground_truth.py		analytical_ground_truth.py
ci_regression_check.py		ci_regression_check.py
examples.py		examples.py
execution_policy.py		execution_policy.py
gpu_analytical_adapter.py		gpu_analytical_adapter.py
hybrid_perceptron_sram.py		hybrid_perceptron_sram.py
lifetime_service.py		lifetime_service.py
main.py		main.py
main_advanced.py		main_advanced.py
ml_benchmark.py		ml_benchmark.py
native_backend.py		native_backend.py
native_hybrid_fidelity_check.py		native_hybrid_fidelity_check.py
perceptron_calibration.py		perceptron_calibration.py
perceptron_logic_gates.py		perceptron_logic_gates.py
pyproject.toml		pyproject.toml
pyside_sram_app_advanced.py		pyside_sram_app_advanced.py
reliability_model.py		reliability_model.py
requirements-base.txt		requirements-base.txt
requirements-benchmark.txt		requirements-benchmark.txt
requirements-dev.txt		requirements-dev.txt
requirements-ui.txt		requirements-ui.txt
requirements.txt		requirements.txt
sample_config.json		sample_config.json
sram_ai_advisor.py		sram_ai_advisor.py
streamlit_app.py		streamlit_app.py
streamlit_app_advanced.py		streamlit_app_advanced.py
streamlit_app_unified.py		streamlit_app_unified.py
streamlit_reliability_tab.py		streamlit_reliability_tab.py
workload_model.py		workload_model.py

Folders and files

Latest commit

History

Repository files navigation

SRAM GPU Portability And Benchmarking

Why This Exists

What I Validated

Representative Results

Claim Boundary

Semiconductor-Domain Validation Roadmap

What Is Validated

What Is Not Claimed

Linux-First Quickstart

Quick Start

CPU-only benchmark smoke

Optional CUDA benchmark smoke

Compatibility wrapper

Dependency Layout

Benchmark Architecture

Standard Benchmark Artifacts

Representative Portability Snapshots

Other Entry Points

Key Docs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages