Skip to content

[FEATURE] Built-in red teaming support #177

@kevmyung

Description

@kevmyung

Problem Statement

ActorSimulator can be configured for red teaming by customizing ActorProfile and system_prompt_template, but there are no built-in presets for common adversarial attack scenarios.

Proposed Solution

Provide built-in red teaming presets:

  • Pre-built adversarial ActorProfile templates (jailbreak, prompt injection, social engineering, etc.)
  • Safety-focused evaluators (bias, toxicity, PII leakage, etc.)
  • Adversarial system_prompt_template presets for ActorSimulator

Use Case

  • Red teaming AI agents for safety compliance before deployment
  • Automated adversarial testing across multiple attack vectors
  • Continuous safety regression testing in CI/CD pipelines

Alternatives Solutions

No response

Additional Context

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions