Perf/stage 2 adaptive daa#4
Conversation
…tion overlays - Implement fixed-point adaptive difficulty adjustment in crates/chain. - Reuse BlockLocalOverlay capacity across validation passes in crates/node to reduce allocations. - Add adaptive-difficulty-harness to tools/perf for local performance measurement. Verification: - cargo fmt --all -- --check - cargo test -p bitcoin-rs-chain adaptive_difficulty - cargo test -p bitcoin-rs-node apply:: --features bitcoinconsensus - cargo clippy -p bitcoin-rs-chain -p bitcoin-rs-node --features bitcoinconsensus
There was a problem hiding this comment.
Code Review
This pull request introduces a fixed-point adaptive difficulty adjustment mechanism and optimizes block validation performance by implementing a local UTXO caching layer (BlockPrevoutCache and BlockLocalOverlay). Feedback focuses on a critical continuity bug in the difficulty multiplier calculation where mixing base-2 and base-e approximations causes non-monotonic jumps. The review also recommends replacing the custom U256 implementation with a standard library, precomputing constants in the LSM slope calculation to avoid redundant work, and optimizing the cache to avoid expensive cloning of output data.
| let error = next_ewma - FIXED_ONE; | ||
| let shift_factor = error >> SCALE_BITS; | ||
| let fractional_part = error & (FIXED_ONE - 1); | ||
| let target_factor = | ||
| FIXED_ONE + fractional_part + ((fractional_part * fractional_part) >> (SCALE_BITS + 1)); |
There was a problem hiding this comment.
The difficulty adjustment multiplier calculation contains a significant continuity bug. The code decomposes the error into a base-2 shift_factor and a fractional_part, but then applies a Taylor series approximation for
Because error approaches error reaches shift_factor increments and fractional_part resets to 0, causing the multiplier to jump down to
If the intention is an ASERT-like DAA (base-2), use a continuous approximation for
| /// Little-endian 256-bit unsigned integer used by the adaptive target engine. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | ||
| pub struct U256(pub [u64; U256_LIMBS]); | ||
|
|
||
| impl U256 { | ||
| /// Returns zero. | ||
| #[must_use] | ||
| pub const fn zero() -> Self { | ||
| Self([0; U256_LIMBS]) | ||
| } | ||
| } |
There was a problem hiding this comment.
The custom U256 implementation is redundant and potentially fragile. The bitcoin-rs-chain crate already depends on the ruint workspace dependency, which provides a highly optimized and thoroughly tested Uint<256, 4>. Using a standard library for big integers is safer, especially for consensus-critical code like a DAA. The current Mul implementation is also a truncating multiplication which could overflow if used with larger targets or multipliers.
| let mut sum_x = 0_i64; | ||
| let mut sum_y = 0_i64; | ||
| let mut sum_xx = 0_i64; | ||
| let mut sum_xy = 0_i64; | ||
|
|
||
| let start_height = self | ||
| .total_blocks | ||
| .saturating_sub(u64::try_from(WINDOW_SIZE).unwrap_or(0)); | ||
|
|
||
| for x in 0..n { | ||
| let actual_idx = usize::try_from( | ||
| start_height + u64::try_from(x).unwrap_or_else(|_| unreachable!("x fits u64")), | ||
| ) | ||
| .unwrap_or_else(|_| unreachable!("usize supports required block index width")) | ||
| & WINDOW_MASK; | ||
| let y = i64::from(self.timestamps[actual_idx]); | ||
|
|
||
| sum_x += x; | ||
| sum_y += y; | ||
| sum_xx += x * x; | ||
| sum_xy += x * y; | ||
| } | ||
|
|
||
| let sum_x_squared = sum_x * sum_x; | ||
| let denominator = n * sum_xx - sum_x_squared; | ||
| if denominator == 0 { | ||
| return Some(FIXED_ONE); | ||
| } | ||
|
|
||
| let sum_xy_projection = sum_x * sum_y; | ||
| let numerator = n * sum_xy - sum_xy_projection; | ||
| let actual_slope = (numerator << SCALE_BITS) / denominator; | ||
| Some(actual_slope / TARGET_BLOCK_TIME) |
There was a problem hiding this comment.
In calculate_lsm_slope, the values for sum_x, sum_xx, and the resulting denominator are constant because WINDOW_SIZE is fixed and x always ranges from 0..WINDOW_SIZE. These should be precomputed as constants to avoid redundant calculations in the hot path. Additionally, performing the division by TARGET_BLOCK_TIME separately from the denominator division introduces unnecessary rounding error; it is more precise to divide by the product of both.
References
- Avoid redundant calculations in performance-sensitive loops by precomputing constants. (link)
| fn lookup(&self, set: &UtxoSet, outpoint: bitcoin::OutPoint) -> Option<LiveOutput> { | ||
| if let Some(entry) = self.entries.borrow().get(&outpoint) { | ||
| return entry.clone(); | ||
| } | ||
| let entry = set.get_entry(&internal_outpoint(&outpoint)); | ||
| self.entries.borrow_mut().insert(outpoint, entry.clone()); | ||
| entry | ||
| } | ||
| } |
There was a problem hiding this comment.
The BlockPrevoutCache::lookup method clones the LiveOutput on every cache hit. Since LiveOutput contains a TxOut which includes an owned Vec<u8> for the script_pubkey, this can lead to significant overhead and memory pressure during block validation if scripts are large. Consider storing Rc<LiveOutput> or Arc<LiveOutput> in the cache to make hits cheap, or returning a reference where possible.
|
Performance check for stage 2 Measured date: 2026-05-25 Review fix included in this measurement: Node apply benchmark: Node apply result: Interpretation: Adaptive difficulty controller microbenchmark: Adaptive difficulty samples: Adaptive difficulty summary: |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75a8ad0dab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const U256_BITS: usize = U256_LIMBS * LIMB_BITS; | ||
|
|
||
| /// Little-endian 256-bit unsigned integer used by the adaptive target engine. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] |
There was a problem hiding this comment.
Implement numeric ordering for little-endian U256
Because U256 stores limbs little-endian, the derived Ord compares the least-significant limb first rather than the most-significant limb. For example, U256([u64::MAX, 0, 0, 0]) sorts greater than U256([0, 1, 0, 0]) even though it is numerically smaller; next_target relies on these comparisons for its min/max clamp, so targets that differ across a 64-bit limb can be bounded in the wrong direction.
Useful? React with 👍 / 👎.
| let error = next_ewma - FIXED_ONE; | ||
| let shift_factor = error >> SCALE_BITS; | ||
| let fractional_part = error & (FIXED_ONE - 1); | ||
| let target_factor = | ||
| FIXED_ONE + fractional_part + ((fractional_part * fractional_part) >> (SCALE_BITS + 1)); |
There was a problem hiding this comment.
Fix negative fractional target scaling
When the EWMA is just below FIXED_ONE, error >> SCALE_BITS floors to -1 while error & (FIXED_ONE - 1) turns the negative fraction into a value near FIXED_ONE; for a window of 599-second blocks this produces roughly (2.5 / 2) * current_target, increasing the target even though faster blocks should make it harder. This makes the controller react in the wrong direction for common slightly-fast windows.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR focuses on stage-2 performance work by reusing per-block UTXO lookup state during block application, and it introduces a new fixed-point adaptive difficulty controller along with a microbenchmark harness.
Changes:
- Add
bitcoin_rs_chain::adaptive_difficulty(fixed-pointDifficultyController+U256) and export it from the chain crate. - Introduce a
tools/perf/adaptive-difficulty-harnessworkspace crate to microbenchmark adaptive difficulty operations. - Rework node block-apply paths to reuse a
BlockPrevoutCacheandBlockLocalOverlayacross script verification, coinbase maturity, BIP68 checks, and BIP158 filter generation.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
crates/node/src/state.rs |
Store reusable per-block caches in NodeState and propagate them through ApplyHandles. |
crates/node/src/apply.rs |
Reset/reuse prevout cache + overlay across validation passes; route filter generation through cached prevout lookups. |
crates/chain/src/lib.rs |
Export new adaptive_difficulty module. |
crates/chain/src/adaptive_difficulty.rs |
Add fixed-point adaptive difficulty primitives (DifficultyController, U256) and tests. |
tools/perf/adaptive-difficulty-harness/src/main.rs |
Add a standalone benchmark harness for adaptive difficulty hot-path timing. |
tools/perf/adaptive-difficulty-harness/Cargo.toml |
Define the harness crate and depend on bitcoin-rs-chain. |
Cargo.toml |
Add the harness crate to the workspace members list. |
Cargo.lock |
Include the new harness crate in the lockfile. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Little-endian 256-bit unsigned integer used by the adaptive target engine. | ||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | ||
| pub struct U256(pub [u64; U256_LIMBS]); | ||
|
|
||
| impl U256 { | ||
| /// Returns zero. | ||
| #[must_use] | ||
| pub const fn zero() -> Self { | ||
| Self([0; U256_LIMBS]) | ||
| } | ||
| } |
| "crates/node", | ||
| "bin/bitcoin-rs", | ||
| "tools/perf/adaptive-difficulty-harness", | ||
| ] |
| handles.prevout_cache.reset(); | ||
| handles.local_overlay.reset(); |
Performance check for stage 2
Measured date: 2026-05-25
Branch: perf/stage-2-adaptive-daa
Review fix included in this measurement:
BlockLocalUtxoView now receives a reusable BlockLocalOverlay for one block apply.
The overlay is reset at the start of each validation pass, so script verification, coinbase maturity, and BIP68 checks still simulate independent block-local state.
The underlying HashMap allocation can retain capacity across those passes.
BlockPrevoutCache::lookup still avoids holding the RefCell borrow while UtxoSet::get_entry may take shard locks.
Node apply benchmark:
Measured path: apply::consensus_rule_tests::apply_block_persists_non_empty_filter_for_valid_same_block_spend
Binary: target/debug/deps/bitcoin_rs_node-362def564f321c5a
Command shape: run the compiled node lib-test binary repeatedly with the exact test filter and --quiet
Duration target: 60 seconds
Node apply result:
iterations=10495
elapsed_ns=59578545695
elapsed_s=59.578546
iter_per_s=176.154014
Interpretation:
This is a local fixed-duration throughput check of one apply_block regression path.
It exercises same-block spend handling, script verification through bitcoinconsensus, UTXO commit, and compact-filter generation.
It is not a full IBD benchmark and not a Criterion distribution.
Adaptive difficulty controller microbenchmark:
Harness location: tools/perf/adaptive-difficulty-harness
Harness commit policy: local-only benchmark harness, excluded by .git/info/exclude, do not commit
Measured path: DifficultyController::push_timestamp plus DifficultyController::next_target
Adaptive difficulty samples:
sample=1 iterations=512000000 elapsed_ns=15337535551 ns_per_iter=29.956 iter_per_sec=33382155.712
sample=2 iterations=512000000 elapsed_ns=14883090637 ns_per_iter=29.069 iter_per_sec=34401456.827
sample=3 iterations=512000000 elapsed_ns=14742131068 ns_per_iter=28.793 iter_per_sec=34730392.617
Adaptive difficulty summary:
samples=3
iterations_per_sample=512000000
mean_ns_per_iter=29.273
best_ns_per_iter=28.793
worst_ns_per_iter=29.956
mean_iter_per_sec=34171335.052
best_iter_per_sec=34730392.617
worst_iter_per_sec=33382155.712