One meaningful change per experiment.
If we change entry logic, sizing, exit logic, and symbol universe at the same time, we learn almost nothing.
Use the same language every time:
tested_theory: the mechanism is coherent, the family is named, the benchmark to beat is named, and the shadow harness existsshadow: the idea is running forward on an honest current control pathvalidated_shadow: the forward path stayed positive with acceptable reset/floating behavior and beat the family-local control or baselinelive: the planned runtime path matches the validated shadow path and no contradiction or guardrail blocker remains
Do not skip stages because a shelf report or one hot sample looks exciting.
family_firewall: proof does not transfer across families, timeframes, or runtime pathscontrol_before_variant: restore or normalize the active control before testing a new variantbucketed_truth: when the thesis is "less losses", split harvest, offensive-close, and forced-unwind buckets before judging the mechanismspread_before_story: if the shape is below the current spread-safe floor, it is not honest current proofstale_runtime_firewall: stale heartbeats, geometry drift, or mixed control truth invalidate promotion claimssingle_changed_variable: each experiment must say exactly what changed and what benchmark it is trying to beat
Before a run, write down:
- experiment id
- date
- canonical bot file
- account context
- stage:
tested_theory,shadow,validated_shadow, orlive - evidence family
- hypothesis
- benchmark to beat
- exact change from previous run
- single changed variable
- symbols or universe in scope
- session window
- risk budget
- kill switch or reset condition
- proof path or runtime harness
After a run, record:
- runtime duration
- starting balance
- ending balance
- peak balance
- peak multiple
- max drawdown
- trade count
- win rate
- average winner / average loser
- best symbols
- worst symbols
- failure mode
- verdict
promote: keep and build on itrevise: promising but unclearreject: do not compound on thissafety-fix: not a strategy result, just a production repair
Use one primary tag per run:
entry: signal or setup logicexit: stop, target, trail, time exit, cut logicsizing: leverage, pyramiding, equity scalinguniverse: which symbols can be tradedregime: session filter, volatility filter, trend filterexecution: spread handling, order fill behavior, crash resilience
An experiment should only be promoted if:
- the result is better on the named family-local benchmark
- the mechanism makes sense, not just the outcome
- the risk of account death did not increase blindly
- the runtime path used for proof matches the promotion story
- the run can be reproduced from the notes
- mechanism is coherent
- benchmark to beat is explicit
- one variable changes
- kill condition is written before launch
- shadow harness is concrete
- fresh forward-positive evidence exists on the current runtime path
- reset, spread, and floating-loss behavior stay inside acceptable bounds
- the candidate beats the family-local control or baseline on both profit quality and loss containment
- no contradiction, guardrail, or stale-runtime blocker is still open
- planned live runtime path matches the validated shadow path
- the candidate survives at least one adverse regime segment
- governance and operator guardrails are aligned
- promotion is based on current forward truth, not copied shelf optimism
Reset the account or start a fresh run when:
- the bot drifts from the logged configuration
- emergency manual intervention changes the strategy outcome
- account state is polluted by legacy positions
- the run no longer maps cleanly to one hypothesis
Use this rhythm to stay rigorous:
- choose the one thing we are testing
- run it long enough to learn something
- log the result immediately
- decide promote, revise, reject, or reset
- only then touch the code again