feat(engine): scenario pool / random sampling by nymeria-ai · Pull Request #5 · mondaycom/sensei

nymeria-ai · 2026-03-19T16:34:18Z

Summary

Adds a scenario pool mechanism to suite YAML that enables random task selection at runtime.

Why

Evaluation suites with fixed tests are gameable — agents (or their owners) can memorize answers. Pools let suite authors define a bank of scenarios and have the engine randomly select N per run.

Use case: agentalent.ai agent verification — 25+ creative challenges in a pool, each run picks 3-5 random tasks. Agents can't pre-script answers.

YAML Syntax

scenarios:
  # Regular scenarios work unchanged
  - id: fixed-task
    name: Always runs
    layer: execution
    input:
      prompt: Do this
    kpis: [...]

  # NEW: Pool — engine picks `count` random scenarios
  - pool:
      id: creative-challenges
      count: 3
      seed: 42  # optional: deterministic selection
      scenarios:
        - id: roast-yourself
          name: Self Roast
          layer: execution
          input:
            prompt: Roast yourself
          kpis: [...]
        # ... more scenarios in pool

Changes

types.ts — ScenarioPool interface, ScenarioEntry union type
schema.ts — Zod validation for pool syntax
loader.ts — resolvePools() with seeded PRNG (mulberry32), Fisher-Yates shuffle
index.ts — exports
pool.test.ts — 14 new tests

Design

Pool resolution in Loader → Runner receives flat ScenarioDefinition[] (zero Runner changes)
Seeded PRNG for reproducible runs (no external deps)
Count > pool size → clamp with warning
Backward compatible — existing suites work unchanged
All 208 engine tests pass (14 new + 194 existing)

Adds a 'pool' mechanism to suite YAML that lets authors define pools of scenarios with random selection at runtime. The engine picks N scenarios from each pool, enabling: - Anti-gaming: agents can't memorize fixed test sets - Variety: different runs test different capabilities - Scalability: large task banks with configurable sample sizes YAML syntax: - pool: id: my-pool count: 3 # pick 3 random scenarios seed: 42 # optional: reproducible selection scenarios: [...] # ScenarioDefinition[] Features: - Seeded PRNG (mulberry32) for deterministic runs - Fisher-Yates shuffle for unbiased selection - Count clamped to pool size (warns, doesn't error) - Validates: no empty pools, no count=0, no ID collisions - Pool resolution in loader — runner receives flat scenario list - Fully backward compatible with existing suite YAML 14 new tests, all 208 engine tests pass.

regevguym merged commit 738d493 into mondaycom:main Mar 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): scenario pool / random sampling#5

feat(engine): scenario pool / random sampling#5
regevguym merged 1 commit intomondaycom:mainfrom
nymeria-ai:feat/scenario-pools

nymeria-ai commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nymeria-ai commented Mar 19, 2026

Summary

Why

YAML Syntax

Changes

Design

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants