Skip to content

Implement Feistel-based permutation for LHS and convert samplers to stateless RNG#32833

Open
zachmprince wants to merge 20 commits intoidaholab:nextfrom
zachmprince:new_latin_sampling
Open

Implement Feistel-based permutation for LHS and convert samplers to stateless RNG#32833
zachmprince wants to merge 20 commits intoidaholab:nextfrom
zachmprince:new_latin_sampling

Conversation

@zachmprince
Copy link
Copy Markdown
Contributor

@zachmprince zachmprince commented Apr 24, 2026

Closes #32775, refs #32194

Reason

Latin Hypercube Sampling (LHS) previously relied on a stateful shuffle() call
inside sampleSetUp() to permute bin assignments. This coupling between the
generator state-machine and the sample-matrix loop made LHS incompatible with
the stateless RNG direction, prevented parallel or out-of-order sample access,
and required save/restore of generator state around every call to
sampleSetUp/sampleTearDown. The same stateful pattern was shared by the
MCMC and active-learning samplers, making the entire Sampler hierarchy harder
to reason about and test.

Design

MooseRandomPerturbation (new framework utility)

A header-only class implementing a keyed pseudo-random permutation of the
integers [0, n) using a balanced Feistel network:

  • The 64-bit seed is split into two 32-bit subkeys; each round mixes the
    half-block with both subkeys and a golden-ratio-derived constant, then applies
    the Murmur3/degski avalanche hash for bit diffusion.
  • Because n need not be a power of two the network operates on the smallest
    padded domain 2^(2*half_bits) >= n and uses cycle-walking to reject
    out-of-range outputs.
  • The permutation is bijective (every input maps to a unique output) and
    invertible (invert(permute(x)) == x).
  • Verified by five new unit tests in unit/src/MooseRandomPerturbationTest.C
    (bijection, invertibility, bit-width range, seed uniqueness, reproducibility).

Redesigned LatinHypercubeSampler

The sampler now uses two stateless generators:

  • Generator 0 — draws the uniform point inside the selected bin.
  • Generator 1 — seeds one MooseRandomPerturbation permuter per column.

In computeSampleRow(row, col) the bin for sample row in column col is
permuter[col].permute(row), so the full LHS sample matrix is determined
entirely by the two generator seeds. No state needs to be saved, restored, or
advanced around a setup callback. Permuters are initialised once in
executeTearDown() after the generator has been advanced to the correct offset.

Sampler base-class cleanup

sampleSetUp() and sampleTearDown() (per-row callbacks that ran inside the
sample-matrix loop and required generator save/restore) have been removed. The
simpler executeSetUp() / executeTearDown() pair (called once before/after
the entire execute()) is sufficient for all remaining use-cases.

The CommMethod enum, shuffle() template, and saveGeneratorState() /
restoreGeneratorState() methods have been removed along with it.

MCMC and active-learning samplers

PMCMCBase and every derived class that previously relied on sampleSetUp to
seed proposals now use executeSetUp() instead. The proposeSamples() interface
no longer takes a seed_value argument; three new helper methods (random(),
randomIndex(), randomIndexPair()) encapsulate stateless index-based draws so
each sampler can advance its own _rand_index counter without touching generator
state directly.

AdaptiveImportanceSampler had an off-by-one in its sample index that is also
corrected here.

Stateless conversion for remaining samplers

MorrisSampler and NestedMonteCarloSampler were also converted from the
stateful shuffle()/sampleSetUp() pattern to the stateless generator API.

Impact

Framework API changes

Removed Replacement
Sampler::sampleSetUp(SampleMode) Sampler::executeSetUp()
Sampler::sampleTearDown(SampleMode) Sampler::executeTearDown()
Sampler::shuffle(begin, end, generator_index) MooseRandomPerturbation
Sampler::saveGeneratorState() / restoreGeneratorState() not needed with stateless RNG
Sampler::CommMethod enum (LOCAL, SEMI_LOCAL, NONE) removed
PMCMCBase::proposeSamples(seed_value, ...) proposeSamples(...) (no seed arg)

New framework utility: framework/include/utils/MooseRandomPerturbation.h
(header-only, no new .C file, no registration macro required).

Gold files for all LHS tests, several MCMC tests, and surrogate-training tests
that pass through an LHS sampler have been regolded because the new Feistel
permutation and stateless generator advancement produce a different (but equally
valid and reproducible) sample sequence.

To-Do

Things to do once reviewers are satisfied with changes.

Update figures, tables, etc. in documentation regarding LHS sampling changes:

  • modules/stochastic_tools/examples/parameter_study.md
  • modules/stochastic_tools/examples/nonlin_parameter_study.md
  • modules/stochastic_tools/examples/sobol.md
  • modules/stochastic_tools/examples/poly_regression_surrogate.md
  • modules/stochastic_tools/examples/pod_rb_surrogate.md
  • modules/stochastic_tools/examples/combined_example_2d_trans_diff.md
  • modules/stochastic_tools/examples/cross_validation.md
  • modules/combined/examples/stm_thermomechanics.md

Update applications:

  • Grizzly

This works and is consistent in parallel, but fails most tests due to algorithmic change.

Integration with MooseRandomStateless will probably require more regolding, so I'm saving this part for later.

Refs idaholab#32194
@zachmprince
Copy link
Copy Markdown
Contributor Author

LHS quality experiment: Feistel permutation vs. Fisher-Yates shuffle

modules/stochastic_tools/examples/lhs/lhs_experiment.{py,ipynb} contains a
numerical comparison of four samplers run against a parametric test function
family.

Test function. Each trial draws an nrow x ndim sample matrix and evaluates

$$ f(\mathbf{x}; \lambda) = \sum_c \frac{e^{x_c}}{2^{c+1}} + \lambda \cdot \frac{\sum_{{i}<{j}} L_2(x_i), L_2(x_j)}{\sqrt{d(d-1)/2}} $$

where $L_2(x) = \sqrt{12}(x - 0.5)$ is the centered Legendre polynomial. The
additive term ($\lambda = 0$) is well-suited to LHS; larger $\lambda$ values add
pairwise interaction that LHS cannot fully stratify, stress-testing each
sampler's behavior when the function is less cooperative. The exact mean of $f$
over $[0,1]^d$ is known analytically, so RMSE is computed directly.

Parameters. 4 samplers x 4 dimensions (4, 8, 16, 32) x 10 sample counts
(8 -- 4096, log-spaced) x 6 $\lambda$ values (0, 0.1, 0.3, 1, 3, 10) x 100
independent trials = 96 000 runs. The Feistel LHS calls the actual
stochastic_tools-opt binary; the others use scipy.stats.qmc.

Result. Across all 240 $(\lambda, d, N)$ combinations the ratio of Feistel
RMSE to Fisher-Yates RMSE has a mean of 0.99 and a standard deviation of
0.10
, with a range of 0.81 -- 1.28. The extreme ratios appear only at the
smallest sample counts (nrow = 8) and highest dimensions (ndim = 32), where
both methods have large absolute RMSE and the ratio is dominated by Monte Carlo
noise across the 100 trials. At $N \ge 32$ the two LHS variants are
indistinguishable. The table below shows the $\lambda = 0$, $d = 8$ slice as a
representative example:

nrow Monte Carlo LHS (Fisher-Yates) LHS (Feistel) Sobol
8 0.110472 0.012540 0.013309 0.011746
16 0.064504 0.004484 0.004344 0.004860
32 0.049537 0.001694 0.001605 0.001547
64 0.029627 0.000548 0.000574 0.000454
128 0.025555 0.000186 0.000216 0.000186
256 0.015685 0.000075 0.000070 0.000053
512 0.012246 0.000027 0.000027 0.000022
1024 0.009555 0.000007 0.000009 0.000011
2048 0.006514 0.000003 0.000003 0.000005
4096 0.003950 0.000001 0.000001 0.000000

Both LHS variants produce the correct $O(N^{-1})$ convergence rate and nearly
identical absolute errors, confirming that the Feistel-network permutation is a
drop-in replacement for the Fisher-Yates shuffle used by scipy (and by the
previous stochastic_tools implementation) with no measurable quality penalty.

@zachmprince
Copy link
Copy Markdown
Contributor Author

Here is a plot of the experiment results. The LHS lines (blue and red) are difficult to see since they are right on top of each other.
image

@moosebuild
Copy link
Copy Markdown
Contributor

Job Precheck, step Python: black format on af7f601 wanted to post the following:

Python black formatting

Your code requires style changes.

A patch was generated and copied here.

You can directly apply the patch by running the following at the top level of your repository:

curl -s https://mooseframework.inl.gov/docs/PRs/32833/black/black.patch | git apply -v

Alternatively, you can run the following at the top level of your repository:

black --config pyproject.toml --workers 1 .

@moosebuild
Copy link
Copy Markdown
Contributor

Job Precheck, step Clang format on f298501 wanted to post the following:

Your code requires style changes.

A patch was auto generated and copied here
You can directly apply the patch by running, in the top level of your repository:

curl -s https://mooseframework.inl.gov/docs/PRs/32833/clang_format/style.patch | git apply -v

Alternatively, with your repository up to date and in the top level of your repository:

git clang-format 162e78c71709df5be85663d216c89383cf162192

@zachmprince zachmprince force-pushed the new_latin_sampling branch 3 times, most recently from 5ba8c63 to 9bff5a0 Compare April 24, 2026 21:08
@moosebuild
Copy link
Copy Markdown
Contributor

moosebuild commented Apr 24, 2026

Job Documentation, step Docs: sync website on 8af031b wanted to post the following:

View the site here

This comment will be updated on new commits.

@moosebuild
Copy link
Copy Markdown
Contributor

moosebuild commented Apr 27, 2026

Job Coverage, step Generate coverage on 8af031b wanted to post the following:

Framework coverage

5b391d #32833 8af031
Total Total +/- New
Rate 85.87% 85.87% -0.00% 100.00%
Hits 132500 132528 +28 71
Misses 21797 21808 +11 0

Diff coverage report

Full coverage report

Modules coverage

Stochastic tools

5b391d #32833 8af031
Total Total +/- New
Rate 90.68% 90.69% +0.02% 100.00%
Hits 8636 8615 -21 136
Misses 888 884 -4 0

Diff coverage report

Full coverage report

Full coverage reports

Reports

This comment will be updated on new commits.

@zachmprince zachmprince marked this pull request as ready for review April 28, 2026 19:45
Copy link
Copy Markdown
Member

@lindsayad lindsayad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framework portion looks good. I'll let @grmnptr handle the stochastic tools review

Comment thread framework/include/samplers/Sampler.h Outdated
Comment thread framework/include/utils/MooseRandomPerturbation.h
Copy link
Copy Markdown
Contributor

@grmnptr grmnptr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Just a few questions.

* the result falls within [0, n).
*
* @param x Input index; must satisfy x < n.
* @return A unique index in [0, n). Calling permute for every x in [0, n)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So doing this multiple times for avoiding the padding is just equivalent to doing more than 8 rounds? So more expensive but still defendable? Could it introduce distortions close to the padding depending on the round function? Or once well-mixed, it doesn't matter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the response from claude:

Cycle-walking is not equivalent to more rounds. More rounds deepen the mixing within a single application of the Feistel network — they improve avalanche diffusion so that each call produces a better-scrambled output. Cycle-walking instead applies the complete 8-round network repeatedly to different intermediate values: permutePadded(permutePadded(x)) runs 8 rounds on x to get y, then 8 fresh rounds on y. These are unrelated computations. So yes, it costs more, but it isn't buying you better mixing of the original input — it's purely a correctness mechanism.

No distortions near the padding boundary. This is the key point. Because permutePadded is a bijection on the full padded domain [0, 2^(2*half_bits)), cycle-walking is equivalent to: take the permutation of the padded domain, discard every element that lands in the padding region [n, 2^(2*half_bits)), and keep only those that land in [0, n). The bijectivity guarantees every value in [0, n) is reached exactly once as you iterate over all starting points in [0, n) — you get a valid, unbiased permutation of [0, n). There is no correlation between proximity to the padding boundary and the distribution of outputs, because the round function's mixing (mix32 + key mixing + round-dependent constant) ensures no structure survives from the padded arithmetic.

Expected cost is bounded. The padded size is the smallest 2^(2k) >= n, so it's at most 4n. The expected number of cycle-walk iterations is padded_size / n <= 4, usually much less. For LHS with reasonable n, the overhead is negligible.

}

uint32_t
MooseRandomPerturbation::roundFunction(uint32_t half, unsigned int round) const
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this round function?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I suppose the function itself doesnt matter too much, but this form must have come from somewhere and there must be a reason why this is used over everything else.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, this is a standard function to use:

0x9e3779b9U is the 32-bit fractional part of the golden ratio: floor((φ−1) × 2³²). It appears in Knuth's multiplicative hash, xxHash, TEA/XTEA, and many others as a "nothing up my sleeve" constant with provably good bit distribution. Multiplying by (round + 1) makes the additive constant distinct each round — without this, pairs of rounds could cancel algebraically (a slide attack), so you'd get no net mixing from repeated even rounds.

mix32 is the degski/Murmur3-finalizer avalanche hash. The specific constants 0x7feb352dU and 0x846ca68bU were found by automated search (Pelle Evensen's work) to maximize the strict avalanche criterion — roughly, each input bit flips ~half the output bits. The xor-shift → multiply → xor-shift → multiply → xor-shift pattern is the standard way to build a bijective 32-bit finalizer; bijective matters here because it means mix32 contributes no collisions.

The overall structurex ^= k0; x += constant*(round+1); x ^= k1; x = mix32(x) — is a lightweight keyed hash construction similar to a TEA round: key mixing, round diversification, key mixing again, then avalanche. It doesn't need to be cryptographically strong (this is for LHS, not encryption); it needs to be fast, key-dependent so different seeds give independent permutations, round-dependent so the Feistel network doesn't degenerate, and well-avalanching so the half-block outputs look uniform. This construction satisfies all four.

Comment thread framework/src/utils/MooseRandomPerturbation.C Outdated
Comment thread modules/stochastic_tools/src/samplers/LatinHypercubeSampler.C
Comment thread modules/stochastic_tools/src/samplers/PMCMCBase.C Outdated
- Removing redundant half mask for left split integer
- Removing forgotten line in PCMCBase

Refs #327194, idaholab#32775
@moosebuild
Copy link
Copy Markdown
Contributor

Job Test, step Results summary on 8af031b wanted to post the following:

Framework test summary

Compared against 5b391df in job civet.inl.gov/job/3787211.

No added tests

Run time changes

Test Base (s) Head (s) +/- Base (MB) Head (MB)
problems/reference_residual_problem.zero_tolerance_ref 3.97 6.24 +57.19% 147.95 138.62

Modules test summary

Compared against 5b391df in job civet.inl.gov/job/3787211.

No added tests

Run time changes

Test Base (s) Head (s) +/- Base (MB) Head (MB)
solid_mechanics/test:smeared_cracking.rz_exponential 11.03 18.17 +64.79% 126.14 136.47
combined/test:combined_plasticity_temperature.ad_temp_dep_yield-jac 6.55 10.49 +60.09% 173.21 187.12
solid_mechanics/test:rate_independent_cyclic_hardening.nonlin_isokinharden_symmetric_strain_controlled 10.13 16.14 +59.25% 150.94 140.31
solid_mechanics/test:rate_independent_cyclic_hardening.1D_ratcheting_nonlin_kinharden_stress_controlled 3.73 5.89 +57.96% 116.25 123.91
solid_mechanics/test:rate_independent_cyclic_hardening.linear_kinharden_nonsymmetric_stress_controlled 4.34 6.85 +57.79% 122.60 127.67
solid_mechanics/test:rate_independent_cyclic_hardening.nonlin_kinharden_symmetric_strain_controlled 4.70 7.37 +56.78% 126.33 127.33
solid_mechanics/test:rate_independent_cyclic_hardening.nonlin_kinharden_nonsymmetric_strain_controlled 5.80 9.04 +55.77% 130.95 132.75
solid_mechanics/test:temperature_dependent_hardening.test 2.17 3.38 +55.52% 121.70 121.92
solid_mechanics/test:combined_creep_plasticity.creepWithPlasticity 5.24 8.13 +55.10% 121.92 122.41
stochastic_tools/test:transfers/sampler_reporter.transfer/normal 6.76 10.44 +54.51% 158.47 176.41
solid_mechanics/test:smeared_cracking.exponential 3.96 6.10 +54.10% 123.71 121.94
solid_mechanics/test:beam/static.euler_finite_y_with_action 3.13 4.80 +53.54% 121.64 136.42
solid_mechanics/test:beam/static.euler_finite_rot_y 3.11 4.78 +53.39% 126.28 128.48
solid_mechanics/test:torque_reaction.disp_about_axis_motion 2.00 3.06 +52.89% 118.95 120.02
solid_mechanics/test:combined_creep_plasticity.combined_start_time 5.72 8.74 +52.84% 118.07 125.55
solid_mechanics/test:smeared_cracking.cracking_rotation_pres_dir_x 3.82 5.84 +52.81% 136.85 149.29
solid_mechanics/test:beam/static.euler_finite_rot_z 3.32 5.07 +52.75% 124.22 138.74
solid_mechanics/test:rate_independent_cyclic_hardening.linear_kinharden_symmetric_strain_controlled 4.83 7.33 +51.91% 144.16 141.68
solid_mechanics/test:crystal_plasticity/cp_eigenstrains.thermal_eigenstrain_011orientation 5.38 8.15 +51.47% 124.00 133.14
stochastic_tools/test:transfers/sampler_reporter.transfer/distributed 3.27 4.94 +51.28% 472.19 517.81
solid_mechanics/test:combined_creep_plasticity.stress_prescribed 3.47 5.25 +51.26% 124.49 124.43
solid_mechanics/test:dynamics/acceleration_bc.acceleration_bc_ti 3.45 5.21 +50.95% 130.91 134.51
solid_mechanics/test:smeared_cracking.cracking_rotation_pres_dir_z 3.87 5.83 +50.75% 132.21 134.29
solid_mechanics/test:dynamics/wave_1D.rayleigh_hht_ad_jac 3.86 5.80 +50.50% 119.75 135.05
solid_mechanics/test:combined_creep_plasticity.combined 5.84 8.78 +50.37% 120.34 123.52
solid_mechanics/test:rate_independent_cyclic_hardening.nonlin_isoharden_symmetric_strain_controlled 3.37 5.06 +50.20% 126.03 129.53
solid_mechanics/test:smeared_cracking.cracking_rotation_pres_dir_xz 3.95 5.93 +50.12% 132.19 139.47
stochastic_tools/test:reporters/BFActiveLearning.sampling_bf/MultipleProc_MultipleRow_Ufunction 9.62 4.39 -54.38% 1314.27 282.41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sampler::execute() is not called for EXEC_INITIAL, leaving executeSetUp() unrun on the first timestep

4 participants