[FEATURE] Pre/post call hooks on ToolSimulator for fault injection and response modification

### Problem Statement

When using `ToolSimulator` to test agent resilience (not just workflow correctness), there's no way to intercept tool calls before or after the LLM generates a response. This makes it impossible to programmatically inject faults like rate limits, timeouts, or partial outages.

We tried several approaches:
- Putting failure instructions in the state description ("return a 429 error on call number 4"). The LLM ignores this consistently due to its helpfulness bias, even with strong prompting. Even when the state explicitly says "MUST return error" and "if you skip the error the test fails," the LLM produces a successful response. This is also unreliable due to LLM non-determinism: on rare occasions it might comply, but you can't depend on it for repeatable testing.
- Wrapping the function with a decorator before registration. Doesn't work because the simulator only uses the function for its name, signature, and docstring. At call time, `_create_tool_wrapper` replaces the function entirely with LLM inference, so the decorator never executes.

Additionally, when tools are called concurrently by the agent, there's no way to coordinate fault behavior across parallel calls using the state-based approach. The LLM generating each tool response operates independently and has no awareness of what other concurrent calls are doing or returning.

The only working approach is subclassing `ToolSimulator` and overriding `_call_tool`, which couples to a private method.


### Proposed Solution

Add optional `pre_call_hook` and `post_call_hook` parameters to `ToolSimulator.__init__`:

```python
simulator = ToolSimulator(
    model=model,
    pre_call_hook=my_fault_injector,
    post_call_hook=my_response_modifier,
)
```

The `pre_call_hook` is called before the LLM generates a response. It receives the tool name, parameters, state key, and previous call history. If it returns a non-None dict, that dict is returned as the tool response (short-circuiting the LLM call). If it returns None, normal simulation proceeds. The fault response should still be cached via `state_registry.cache_tool_call` so subsequent calls see the failure in their context.

```python
def my_fault_injector(tool_name, parameters, state_key, previous_calls):
    if random.random() < 0.3:
        return {"error": {"code": "QuotaExceeded", "retryAfterSeconds": 2}}
    return None
```

The `post_call_hook` is called after the LLM generates a response but before it's cached. It receives the same context plus the response dict, and returns a (possibly modified) response.

```python
def my_response_modifier(tool_name, parameters, state_key, response):
    response["_simulated_latency_ms"] = random.randint(50, 500)
    return response
```

This is a small change (two optional params on `__init__`, a few lines in `_call_tool`), fully backward compatible, and follows the same hook pattern the SDK uses at the agent level.

### Use Case

- Rate limiting: randomly return 429 errors to test agent retry logic
- Partial failures: specific tools fail intermittently while others work
- Timeouts: hard cutoff after N calls to test graceful degradation
- Response modification: inject latency metadata, add missing fields, corrupt responses for robustness testing
- Chaos testing: compose multiple fault types to simulate degraded API conditions

### Alternatives Solutions

- State-based LLM instructions: unreliable, LLM ignores error instructions
- Subclassing `ToolSimulator` and overriding `_call_tool`: works but couples to a private method

### Additional Context

Related to [issue #114](https://github.com/strands-agents/evals/issues/114) (Chaos/Resiliency Evaluation of Agents). The hooks proposed here would provide the low-level extension point that a higher-level chaos testing library (as described in #114) could build on.

We have a working implementation using the subclass approach and are happy to contribute a PR for the hooks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Pre/post call hooks on ToolSimulator for fault injection and response modification #167

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE] Pre/post call hooks on ToolSimulator for fault injection and response modification #167

Description

Problem Statement

Proposed Solution

Use Case

Alternatives Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions