Skip to content

Perf/stage 2 adaptive daa#4

Open
gg582 wants to merge 4 commits into
mainfrom
perf/stage-2-adaptive-daa
Open

Perf/stage 2 adaptive daa#4
gg582 wants to merge 4 commits into
mainfrom
perf/stage-2-adaptive-daa

Conversation

@gg582

@gg582 gg582 commented May 25, 2026

Copy link
Copy Markdown

Performance check for stage 2

Measured date: 2026-05-25
Branch: perf/stage-2-adaptive-daa

Review fix included in this measurement:
BlockLocalUtxoView now receives a reusable BlockLocalOverlay for one block apply.
The overlay is reset at the start of each validation pass, so script verification, coinbase maturity, and BIP68 checks still simulate independent block-local state.
The underlying HashMap allocation can retain capacity across those passes.
BlockPrevoutCache::lookup still avoids holding the RefCell borrow while UtxoSet::get_entry may take shard locks.

Node apply benchmark:
Measured path: apply::consensus_rule_tests::apply_block_persists_non_empty_filter_for_valid_same_block_spend
Binary: target/debug/deps/bitcoin_rs_node-362def564f321c5a
Command shape: run the compiled node lib-test binary repeatedly with the exact test filter and --quiet
Duration target: 60 seconds

Node apply result:
iterations=10495
elapsed_ns=59578545695
elapsed_s=59.578546
iter_per_s=176.154014

Interpretation:
This is a local fixed-duration throughput check of one apply_block regression path.
It exercises same-block spend handling, script verification through bitcoinconsensus, UTXO commit, and compact-filter generation.
It is not a full IBD benchmark and not a Criterion distribution.

Adaptive difficulty controller microbenchmark:
Harness location: tools/perf/adaptive-difficulty-harness
Harness commit policy: local-only benchmark harness, excluded by .git/info/exclude, do not commit
Measured path: DifficultyController::push_timestamp plus DifficultyController::next_target

Adaptive difficulty samples:
sample=1 iterations=512000000 elapsed_ns=15337535551 ns_per_iter=29.956 iter_per_sec=33382155.712
sample=2 iterations=512000000 elapsed_ns=14883090637 ns_per_iter=29.069 iter_per_sec=34401456.827
sample=3 iterations=512000000 elapsed_ns=14742131068 ns_per_iter=28.793 iter_per_sec=34730392.617

Adaptive difficulty summary:
samples=3
iterations_per_sample=512000000
mean_ns_per_iter=29.273
best_ns_per_iter=28.793
worst_ns_per_iter=29.956
mean_iter_per_sec=34171335.052
best_iter_per_sec=34730392.617
worst_iter_per_sec=33382155.712

gg582 added 2 commits May 25, 2026 23:25
…tion overlays

- Implement fixed-point adaptive difficulty adjustment in crates/chain.
- Reuse BlockLocalOverlay capacity across validation passes in crates/node to reduce allocations.
- Add adaptive-difficulty-harness to tools/perf for local performance measurement.

Verification:
- cargo fmt --all -- --check
- cargo test -p bitcoin-rs-chain adaptive_difficulty
- cargo test -p bitcoin-rs-node apply:: --features bitcoinconsensus
- cargo clippy -p bitcoin-rs-chain -p bitcoin-rs-node --features bitcoinconsensus

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fixed-point adaptive difficulty adjustment mechanism and optimizes block validation performance by implementing a local UTXO caching layer (BlockPrevoutCache and BlockLocalOverlay). Feedback focuses on a critical continuity bug in the difficulty multiplier calculation where mixing base-2 and base-e approximations causes non-monotonic jumps. The review also recommends replacing the custom U256 implementation with a standard library, precomputing constants in the LSM slope calculation to avoid redundant work, and optimizing the cache to avoid expensive cloning of output data.

Comment on lines +204 to +208
let error = next_ewma - FIXED_ONE;
let shift_factor = error >> SCALE_BITS;
let fractional_part = error & (FIXED_ONE - 1);
let target_factor =
FIXED_ONE + fractional_part + ((fractional_part * fractional_part) >> (SCALE_BITS + 1));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The difficulty adjustment multiplier calculation contains a significant continuity bug. The code decomposes the error into a base-2 shift_factor and a fractional_part, but then applies a Taylor series approximation for $e^x$ ($1 + x + x^2/2$) to the fractional part.

Because $e^1 \approx 2.718$ while $2^1 = 2$, the multiplier will jump non-monotonically at integer boundaries of the error. For example, as error approaches $2^{16}-1$, the multiplier approaches $\approx 2.5$. When error reaches $2^{16}$, the shift_factor increments and fractional_part resets to 0, causing the multiplier to jump down to $2.0$. This will cause severe oscillations in difficulty.

If the intention is an ASERT-like DAA (base-2), use a continuous approximation for $2^f$ where $f \in [0, 1)$, such as a linear approximation ($1 + f$) or a specifically tuned polynomial that satisfies $poly(1) = 2 \cdot poly(0)$.

Comment on lines +23 to +33
/// Little-endian 256-bit unsigned integer used by the adaptive target engine.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct U256(pub [u64; U256_LIMBS]);

impl U256 {
/// Returns zero.
#[must_use]
pub const fn zero() -> Self {
Self([0; U256_LIMBS])
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The custom U256 implementation is redundant and potentially fragile. The bitcoin-rs-chain crate already depends on the ruint workspace dependency, which provides a highly optimized and thoroughly tested Uint<256, 4>. Using a standard library for big integers is safer, especially for consensus-critical code like a DAA. The current Mul implementation is also a truncating multiplication which could overflow if used with larger targets or multipliers.

Comment on lines +151 to +183
let mut sum_x = 0_i64;
let mut sum_y = 0_i64;
let mut sum_xx = 0_i64;
let mut sum_xy = 0_i64;

let start_height = self
.total_blocks
.saturating_sub(u64::try_from(WINDOW_SIZE).unwrap_or(0));

for x in 0..n {
let actual_idx = usize::try_from(
start_height + u64::try_from(x).unwrap_or_else(|_| unreachable!("x fits u64")),
)
.unwrap_or_else(|_| unreachable!("usize supports required block index width"))
& WINDOW_MASK;
let y = i64::from(self.timestamps[actual_idx]);

sum_x += x;
sum_y += y;
sum_xx += x * x;
sum_xy += x * y;
}

let sum_x_squared = sum_x * sum_x;
let denominator = n * sum_xx - sum_x_squared;
if denominator == 0 {
return Some(FIXED_ONE);
}

let sum_xy_projection = sum_x * sum_y;
let numerator = n * sum_xy - sum_xy_projection;
let actual_slope = (numerator << SCALE_BITS) / denominator;
Some(actual_slope / TARGET_BLOCK_TIME)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In calculate_lsm_slope, the values for sum_x, sum_xx, and the resulting denominator are constant because WINDOW_SIZE is fixed and x always ranges from 0..WINDOW_SIZE. These should be precomputed as constants to avoid redundant calculations in the hot path. Additionally, performing the division by TARGET_BLOCK_TIME separately from the denominator division introduces unnecessary rounding error; it is more precise to divide by the product of both.

References
  1. Avoid redundant calculations in performance-sensitive loops by precomputing constants. (link)

Comment thread crates/node/src/apply.rs
Comment on lines +663 to +671
fn lookup(&self, set: &UtxoSet, outpoint: bitcoin::OutPoint) -> Option<LiveOutput> {
if let Some(entry) = self.entries.borrow().get(&outpoint) {
return entry.clone();
}
let entry = set.get_entry(&internal_outpoint(&outpoint));
self.entries.borrow_mut().insert(outpoint, entry.clone());
entry
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The BlockPrevoutCache::lookup method clones the LiveOutput on every cache hit. Since LiveOutput contains a TxOut which includes an owned Vec<u8> for the script_pubkey, this can lead to significant overhead and memory pressure during block validation if scripts are large. Consider storing Rc<LiveOutput> or Arc<LiveOutput> in the cache to make hits cheap, or returning a reference where possible.

@gg582

gg582 commented May 26, 2026

Copy link
Copy Markdown
Author

Performance check for stage 2

Measured date: 2026-05-25
Branch: perf/stage-2-adaptive-daa

Review fix included in this measurement:
BlockLocalUtxoView now receives a reusable BlockLocalOverlay for one block apply.
The overlay is reset at the start of each validation pass, so script verification, coinbase maturity, and BIP68 checks still simulate independent block-local state.
The underlying HashMap allocation can retain capacity across those passes.
BlockPrevoutCache::lookup still avoids holding the RefCell borrow while UtxoSet::get_entry may take shard locks.

Node apply benchmark:
Measured path: apply::consensus_rule_tests::apply_block_persists_non_empty_filter_for_valid_same_block_spend
Binary: target/debug/deps/bitcoin_rs_node-362def564f321c5a
Command shape: run the compiled node lib-test binary repeatedly with the exact test filter and --quiet
Duration target: 60 seconds

Node apply result:
iterations=10495
elapsed_ns=59578545695
elapsed_s=59.578546
iter_per_s=176.154014

Interpretation:
This is a local fixed-duration throughput check of one apply_block regression path.
It exercises same-block spend handling, script verification through bitcoinconsensus, UTXO commit, and compact-filter generation.
It is not a full IBD benchmark and not a Criterion distribution.

Adaptive difficulty controller microbenchmark:
Harness location: tools/perf/adaptive-difficulty-harness
Harness commit policy: local-only benchmark harness, excluded by .git/info/exclude, do not commit
Measured path: DifficultyController::push_timestamp plus DifficultyController::next_target

Adaptive difficulty samples:
sample=1 iterations=512000000 elapsed_ns=15337535551 ns_per_iter=29.956 iter_per_sec=33382155.712
sample=2 iterations=512000000 elapsed_ns=14883090637 ns_per_iter=29.069 iter_per_sec=34401456.827
sample=3 iterations=512000000 elapsed_ns=14742131068 ns_per_iter=28.793 iter_per_sec=34730392.617

Adaptive difficulty summary:
samples=3
iterations_per_sample=512000000
mean_ns_per_iter=29.273
best_ns_per_iter=28.793
worst_ns_per_iter=29.956
mean_iter_per_sec=34171335.052
best_iter_per_sec=34730392.617
worst_iter_per_sec=33382155.712

@metaphorics metaphorics requested a review from Copilot June 2, 2026 15:14
@metaphorics

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75a8ad0dab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const U256_BITS: usize = U256_LIMBS * LIMB_BITS;

/// Little-endian 256-bit unsigned integer used by the adaptive target engine.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Implement numeric ordering for little-endian U256

Because U256 stores limbs little-endian, the derived Ord compares the least-significant limb first rather than the most-significant limb. For example, U256([u64::MAX, 0, 0, 0]) sorts greater than U256([0, 1, 0, 0]) even though it is numerically smaller; next_target relies on these comparisons for its min/max clamp, so targets that differ across a 64-bit limb can be bounded in the wrong direction.

Useful? React with 👍 / 👎.

Comment on lines +204 to +208
let error = next_ewma - FIXED_ONE;
let shift_factor = error >> SCALE_BITS;
let fractional_part = error & (FIXED_ONE - 1);
let target_factor =
FIXED_ONE + fractional_part + ((fractional_part * fractional_part) >> (SCALE_BITS + 1));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix negative fractional target scaling

When the EWMA is just below FIXED_ONE, error >> SCALE_BITS floors to -1 while error & (FIXED_ONE - 1) turns the negative fraction into a value near FIXED_ONE; for a window of 599-second blocks this produces roughly (2.5 / 2) * current_target, increasing the target even though faster blocks should make it harder. This makes the controller react in the wrong direction for common slightly-fast windows.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on stage-2 performance work by reusing per-block UTXO lookup state during block application, and it introduces a new fixed-point adaptive difficulty controller along with a microbenchmark harness.

Changes:

  • Add bitcoin_rs_chain::adaptive_difficulty (fixed-point DifficultyController + U256) and export it from the chain crate.
  • Introduce a tools/perf/adaptive-difficulty-harness workspace crate to microbenchmark adaptive difficulty operations.
  • Rework node block-apply paths to reuse a BlockPrevoutCache and BlockLocalOverlay across script verification, coinbase maturity, BIP68 checks, and BIP158 filter generation.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/node/src/state.rs Store reusable per-block caches in NodeState and propagate them through ApplyHandles.
crates/node/src/apply.rs Reset/reuse prevout cache + overlay across validation passes; route filter generation through cached prevout lookups.
crates/chain/src/lib.rs Export new adaptive_difficulty module.
crates/chain/src/adaptive_difficulty.rs Add fixed-point adaptive difficulty primitives (DifficultyController, U256) and tests.
tools/perf/adaptive-difficulty-harness/src/main.rs Add a standalone benchmark harness for adaptive difficulty hot-path timing.
tools/perf/adaptive-difficulty-harness/Cargo.toml Define the harness crate and depend on bitcoin-rs-chain.
Cargo.toml Add the harness crate to the workspace members list.
Cargo.lock Include the new harness crate in the lockfile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +33
/// Little-endian 256-bit unsigned integer used by the adaptive target engine.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct U256(pub [u64; U256_LIMBS]);

impl U256 {
/// Returns zero.
#[must_use]
pub const fn zero() -> Self {
Self([0; U256_LIMBS])
}
}
Comment thread Cargo.toml
Comment on lines 21 to 24
"crates/node",
"bin/bitcoin-rs",
"tools/perf/adaptive-difficulty-harness",
]
Comment thread crates/node/src/apply.rs
Comment on lines +234 to +235
handles.prevout_cache.reset();
handles.local_overlay.reset();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants