Skip to content

[FEATURE] Pre/post call hooks on ToolSimulator for fault injection and response modification #167

@kaghatim

Description

@kaghatim

Problem Statement

When using ToolSimulator to test agent resilience (not just workflow correctness), there's no way to intercept tool calls before or after the LLM generates a response. This makes it impossible to programmatically inject faults like rate limits, timeouts, or partial outages.

We tried several approaches:

  • Putting failure instructions in the state description ("return a 429 error on call number 4"). The LLM ignores this consistently due to its helpfulness bias, even with strong prompting. Even when the state explicitly says "MUST return error" and "if you skip the error the test fails," the LLM produces a successful response. This is also unreliable due to LLM non-determinism: on rare occasions it might comply, but you can't depend on it for repeatable testing.
  • Wrapping the function with a decorator before registration. Doesn't work because the simulator only uses the function for its name, signature, and docstring. At call time, _create_tool_wrapper replaces the function entirely with LLM inference, so the decorator never executes.

Additionally, when tools are called concurrently by the agent, there's no way to coordinate fault behavior across parallel calls using the state-based approach. The LLM generating each tool response operates independently and has no awareness of what other concurrent calls are doing or returning.

The only working approach is subclassing ToolSimulator and overriding _call_tool, which couples to a private method.

Proposed Solution

Add optional pre_call_hook and post_call_hook parameters to ToolSimulator.__init__:

simulator = ToolSimulator(
    model=model,
    pre_call_hook=my_fault_injector,
    post_call_hook=my_response_modifier,
)

The pre_call_hook is called before the LLM generates a response. It receives the tool name, parameters, state key, and previous call history. If it returns a non-None dict, that dict is returned as the tool response (short-circuiting the LLM call). If it returns None, normal simulation proceeds. The fault response should still be cached via state_registry.cache_tool_call so subsequent calls see the failure in their context.

def my_fault_injector(tool_name, parameters, state_key, previous_calls):
    if random.random() < 0.3:
        return {"error": {"code": "QuotaExceeded", "retryAfterSeconds": 2}}
    return None

The post_call_hook is called after the LLM generates a response but before it's cached. It receives the same context plus the response dict, and returns a (possibly modified) response.

def my_response_modifier(tool_name, parameters, state_key, response):
    response["_simulated_latency_ms"] = random.randint(50, 500)
    return response

This is a small change (two optional params on __init__, a few lines in _call_tool), fully backward compatible, and follows the same hook pattern the SDK uses at the agent level.

Use Case

  • Rate limiting: randomly return 429 errors to test agent retry logic
  • Partial failures: specific tools fail intermittently while others work
  • Timeouts: hard cutoff after N calls to test graceful degradation
  • Response modification: inject latency metadata, add missing fields, corrupt responses for robustness testing
  • Chaos testing: compose multiple fault types to simulate degraded API conditions

Alternatives Solutions

  • State-based LLM instructions: unreliable, LLM ignores error instructions
  • Subclassing ToolSimulator and overriding _call_tool: works but couples to a private method

Additional Context

Related to issue #114 (Chaos/Resiliency Evaluation of Agents). The hooks proposed here would provide the low-level extension point that a higher-level chaos testing library (as described in #114) could build on.

We have a working implementation using the subclass approach and are happy to contribute a PR for the hooks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions