Skip to content

Reuse the cuStateVec handle across kernel executions#4746

Open
ikkoham wants to merge 1 commit into
NVIDIA:mainfrom
ikkoham:perf/cusv-handle-reuse
Open

Reuse the cuStateVec handle across kernel executions#4746
ikkoham wants to merge 1 commit into
NVIDIA:mainfrom
ikkoham:perf/cusv-handle-reuse

Conversation

@ikkoham

@ikkoham ikkoham commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

CuStateVecCircuitSimulator calls custatevecCreate on every state-vector allocation and custatevecDestroy in deallocateStateImpl, i.e. it creates+destroys the cuStateVec handle on every kernel execution. The handle is a device-level context (independent of any state vector), so it can be created once per device and reused. This adds ensureHandle() which creates the handle lazily and recreates it only when the active CUDA device changes, drops the per-deallocation destroy, and destroys the handle once in the destructor.

Benchmark

call before after
get_state 2.53 ms 0.16 ms (~15×)
observe (shots=0) 2.60 ms 0.25 ms (~10×)
sample (1k shots) 2.53 ms 0.53 ms (~4.7×)
run (100 shots) 201 ms 10 ms (~20×)

These benchmarks do not include the first execution. (warm)

Benchmark kernels
import time, cudaq
from cudaq import spin
NQ, THETA, LAYERS = 5, 0.5, 1

# for observe and get_state
@cudaq.kernel
def ansatz(n: int, theta: float, layers: int):
    q = cudaq.qvector(n)
    for _ in range(layers):
        h(q[0])
        for i in range(n - 1):
            x.ctrl(q[i], q[i + 1])
        ry(theta, q[0])

# for sample
@cudaq.kernel
def ansatz_measured(n: int, theta: float, layers: int):
    q = cudaq.qvector(n)
    for _ in range(layers):
        h(q[0])
        for i in range(n - 1):
            x.ctrl(q[i], q[i + 1])
        ry(theta, q[0])
    mz(q)

# for run
@cudaq.kernel
def ansatz_returning(n: int, theta: float, layers: int) -> bool:
    q = cudaq.qvector(n)
    for _ in range(layers):
        h(q[0])
        for i in range(n - 1):
            x.ctrl(q[i], q[i + 1])
        ry(theta, q[0])
    return mz(q[0])

The cuStateVec handle create+destroy round-trip is ~1–2 ms (warm), paid per launch before this change. run benefits most because it re-executes the kernel per shot, paying N handle create/destroys per call.

Signed-off-by: ikkoham <ikkoham@users.noreply.github.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ikkoham

ikkoham commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test 7588733

Command Bot: Processing...

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

CI Summary (push) — ✅ passed

Run #27603401948 · ✅ 6 · ⏩ 7 · ❌ 0 · ⛔ 0

Top-level jobs (13)
Job Result
binaries ⏩ skipped
build_and_test ✅ success
config_devdeps ✅ success
config_source_build ⏩ skipped
config_wheeldeps ✅ success
devdeps ✅ success
docker_image ⏩ skipped
gen_code_coverage ⏩ skipped
metadata ✅ success
python_metapackages ⏩ skipped
python_wheels ⏩ skipped
source_build ⏩ skipped
wheeldeps ✅ success
⏩ Skipped jobs (7) — intentionally skipped on PR builds; run on merge_group / workflow_dispatch
Job
binaries
config_source_build
docker_image
gen_code_coverage
python_metapackages
python_wheels
source_build
All sub-jobs (42) — every matrix leg, with links
Job Status Link
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
CI Summary ❔ in_progress view
Configure build (devdeps) ✅ success view
Configure build (source_build) ⏩ skipped view
Configure build (wheeldeps) ✅ success view
Create CUDA Quantum installer ⏩ skipped view
Create Docker images ⏩ skipped view
Create Python metapackages ⏩ skipped view
Create Python wheels ⏩ skipped view
Gen code coverage ⏩ skipped view
Load dependencies (amd64, gcc12) / Caching ✅ success view
Load dependencies (amd64, gcc12) / Finalize ✅ success view
Load dependencies (amd64, gcc12) / Metadata ✅ success view
Load dependencies (amd64, llvm) / Caching ✅ success view
Load dependencies (amd64, llvm) / Finalize ✅ success view
Load dependencies (amd64, llvm) / Metadata ✅ success view
Load dependencies (arm64, gcc12) / Caching ✅ success view
Load dependencies (arm64, gcc12) / Finalize ✅ success view
Load dependencies (arm64, gcc12) / Metadata ✅ success view
Load dependencies (arm64, llvm) / Caching ✅ success view
Load dependencies (arm64, llvm) / Finalize ✅ success view
Load dependencies (arm64, llvm) / Metadata ✅ success view
Load source build cache ⏩ skipped view
Load wheel dependencies (amd64, 12.6) / Caching ✅ success view
Load wheel dependencies (amd64, 12.6) / Finalize ✅ success view
Load wheel dependencies (amd64, 12.6) / Metadata ✅ success view
Load wheel dependencies (amd64, 13.0) / Caching ✅ success view
Load wheel dependencies (amd64, 13.0) / Finalize ✅ success view
Load wheel dependencies (amd64, 13.0) / Metadata ✅ success view
Load wheel dependencies (arm64, 12.6) / Caching ✅ success view
Load wheel dependencies (arm64, 12.6) / Finalize ✅ success view
Load wheel dependencies (arm64, 12.6) / Metadata ✅ success view
Load wheel dependencies (arm64, 13.0) / Caching ✅ success view
Load wheel dependencies (arm64, 13.0) / Finalize ✅ success view
Load wheel dependencies (arm64, 13.0) / Metadata ✅ success view
Prepare cache clean-up ❔ in_progress view
Retrieve PR info ✅ success view
✅ Required checks (6/6) — declared in .github/required-checks.yml for push
Required check Status Link
Build and test (amd64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Debug) ✅ success view
Build and test (arm64, llvm, openmpi) / Dev environment (Python) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Debug) ✅ success view
Build and test (amd64, gcc12, openmpi) / Dev environment (Python) ✅ success view

@1tnguyen 1tnguyen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks @ikkoham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants