Skip to content

Benchmark: Evaluate replacing new byte[n] allocations with stackalloc or ArrayPool<byte> in NLightning.Bolt11 #72

@nGoline

Description

@nGoline

Summary

Design and run comprehensive benchmarks to compare the current pattern of heap allocations for byte[] (via new byte[n]) against alternatives using stackalloc and ArrayPool<byte> throughout the NLightning.Bolt11 assembly. Measure execution time, memory allocation, and CPU usage under realistic workloads (e.g., decoding BOLT11 invoices). If results show meaningful improvements, follow up with a PR to apply the preferred approach consistently across the project.

Motivation

  • new byte[n] causes heap allocations and GC pressure for transient buffers used in parsing/encoding BOLT11 invoices.
  • stackalloc can avoid heap allocations for small, short‑lived buffers but increases stack usage and requires Span<T>-based code. Which is mostly ok in the project.
  • ArrayPool<byte> can amortize buffer costs for larger or variable‑size buffers but adds rental/return complexity and potential for misuse.
  • We need data to guide a project‑wide change to one of these strategies (or to stay as is).

Scope

  • Benchmark the existing code paths that allocate temporary byte[] buffers in NLightning.Bolt11.
  • Compare three strategies:
    1. Baseline: new byte[n] (status quo)
    2. stackalloc (where possible – small, fixed‑size buffers)
    3. ArrayPool<byte>.Shared.Rent/Return (for larger/variable sizes)
  • Workloads: realistic scenarios such as full invoice decode (and optionally encode) across small, typical, and large inputs.
  • Metrics: execution time, total allocations (bytes/GC count), and CPU usage.

Out of Scope

  • Immediate refactoring across the codebase. That will be proposed only if benchmarks show a clear benefit.

Proposed Work

  1. Create a dedicated benchmark project under benchmark/NLightning.Bolt11.Benchmarks using BenchmarkDotNet.
  2. Implement benchmarks that:
    • Drive end‑to‑end decode of BOLT11 invoices with datasets representing common and worst‑case sizes.
    • Include micro-benchmarks for the most allocation‑heavy routines (e.g., bit readers/writers, tagged field parsing, bech32 operations) to isolate buffer behavior.
  3. Provide three variants for each benchmarked routine:
    • Baseline (new byte[n])
    • stackalloc (guard with size thresholds and safe spans)
    • ArrayPool<byte>
  4. Collect metrics:
    • BenchmarkDotNet’s standard stats (Mean, P95, StdDev)
    • Allocated bytes, Gen0/1/2 counts
    • Optional CPU sampling/tracing corroboration using external tools
  5. Document results and provide a recommendation (strategy/thresholds). If beneficial, open a follow‑up PR to apply the chosen strategy consistently.

Methodology & Metrics

  • Use BenchmarkDotNet with Release builds, RunStrategy.Monitoring, and GcForce disabled to reflect realistic GC.
  • Configure multiple input sizes:
    • Small invoices
    • Typical invoices
    • Large invoices (many tagged fields, long route info)
  • Metrics from BDN:
    • Mean, Error, StdDev, Median
    • Allocated (bytes), Gen0/1/2
  • External validation (optional but recommended):
    • CPU: dotnet-trace + Speedscope/PerfView
    • Counters: dotnet-counters for GC (alloc rate, GC count)
  • Ensure warmup and multiple iterations; include environment info in the report (TFM, OS, CPU model, .NET version).

Benchmark Project Structure

  • benchmark/NLightning.Bolt11.Benchmarks/ (new project)
    • References src/NLightning.Bolt11
    • Contains:
      • DecodeInvoiceBenchmarks.cs (end‑to‑end scenarios)
      • BufferStrategyBenchmarks.cs (microbenchmarks comparing allocation strategies)
      • TestData/ with representative invoice samples
      • README.md with instructions to run and interpret results

Example BenchmarkDotNet Template

using System;
using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;

[SimpleJob(RuntimeMoniker.Net80, warmupCount: 3, iterationCount: 15)]
[MemoryDiagnoser]
public class BufferStrategyBenchmarks
{
    [Params(16, 64, 256, 1024, 4096)]
    public int N;

    [Benchmark(Baseline = true)]
    public int Baseline_NewArray()
    {
        var buf = new byte[N];
        return Touch(buf);
    }

    [Benchmark]
    public int Stackalloc_WhenSmall()
    {
        if (N <= 256)
        {
            Span<byte> span = stackalloc byte[N];
            return Touch(span);
        }
        else
        {
            var buf = new byte[N];
            return Touch(buf);
        }
    }

    [Benchmark]
    public int ArrayPool_RentReturn()
    {
        var pool = ArrayPool<byte>.Shared;
        var buf = pool.Rent(N);
        try { return Touch(buf.AsSpan(0, N)); }
        finally { pool.Return(buf, clearArray: false); }
    }

    private static int Touch(Span<byte> s)
    {
        int x = 0;
        for (int i = 0; i < s.Length; i++)
            x ^= i;
        return x;
    }
}

Datasets

  • Curate a set of real‑world and synthetic BOLT11 invoice samples:
    • Minimal invoices
    • Typical invoices (median size from your logs/fixtures)
    • Stress invoices (max fields, large route info, long descriptions)
  • Reuse samples from existing tests under test/NLightning.Bolt11.Tests and test/NLightning.Integration.Tests where possible.

Tooling (suggested) — links

Risks / Considerations

  • stackalloc only for small buffers; large stack allocations risk stack overflow.
  • ArrayPool<byte> requires careful zeroing policy and correct Return usage to avoid data leakage and correctness issues.
  • Some APIs may need Span<T> overloads; refactoring effort should be considered in the follow‑up PR.
  • Ensure benchmarks aren’t over‑optimized by the JIT; vary inputs and prevent dead‑code elimination.

Acceptance Criteria

  • Benchmark project added under benchmark/ that can be run locally and in CI (optional) for reproducible results.
  • Clear scripts/instructions to run benchmarks and collect results for time, allocations, and CPU.
  • Report comparing strategies across representative workloads, with environment details.
  • Decision documented: keep new, switch to stackalloc for ≤X bytes, or prefer ArrayPool<byte> beyond a threshold (or hybrid).
  • If improvement is substantial, open a follow‑up PR to apply the chosen strategy consistently in NLightning.Bolt11.

Deliverables

  • Benchmark project + source.
  • Benchmark results (Markdown/CSV) checked into benchmark/results/ with date and environment metadata.
  • Recommendation summary and next steps.

How to Run

  • From repo root:
    • dotnet build -c Release
    • dotnet run -c Release --project benchmark/NLightning.Bolt11.Benchmarks
    • Optional: dotnet-counters monitor System.Runtime -- dotnet run ...
    • Optional: dotnet-trace collect -- dotnet run ... and analyze with PerfView/Speedscope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions