Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 12 additions & 83 deletions skills/prompt-optimizer/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: prompt-optimizer
description: Create, optimize, and iteratively refine agent prompts and system prompts. Use when asked to "improve a prompt", "optimize a system prompt", "rewrite an agent prompt", "tune prompt wording", "make this prompt more reliable", or "adapt a prompt for OpenAI, Claude, or Gemini". Handles model-specific prompt guidance, prompt markers/tags, eval design, and meta optimization loops for new and existing prompts.
description: Create, optimize, and iteratively refine agent prompts and system prompts. Use when asked to "improve a prompt", "optimize a system prompt", "rewrite an agent prompt", "tune prompt wording", "make this prompt more reliable", "adapt a prompt for OpenAI, Claude, or Gemini", "design tool policy for an agent prompt", "how should I expose tools in a prompt", or "how should I disclose skills in an agent". Handles model-specific prompt guidance, prompt markers/tags, tool disclosure and tool-call narration, skill disclosure and routing, layered platform/deployer prompts, eval design, and meta optimization loops.
---

# Prompt Optimizer
Expand Down Expand Up @@ -55,75 +55,22 @@ Read `references/model-family-notes.md`.

## Step 3: Shape the prompt deliberately

Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md` respectively.

1. Separate durable behavior from task-local context:
- stable policy and behavioral defaults belong in `system` or `developer`
- variable inputs, retrieved context, and task instances belong in templated user-facing sections
- when the system prompt is assembled at runtime from a platform layer and a deployer-authored persona layer (e.g., `SOUL.md`, `CLAUDE.md`, `AGENTS.md`), see "Layered prompts with multiple owners" in `references/core-patterns.md` — platform behavior rules must not depend on what the deployer layer contains

2. Keep one authoritative instruction per behavior:
- if a rule appears in more than one layer, choose one owner for it
- stable cross-task rules belong in `system` or `developer`
- examples should teach format, edge-case handling, or tool behavior, not restate the whole policy
- user payloads should carry task-local facts, not durable policy

3. Use markers only when they reduce ambiguity:
- use markdown headings or XML-style tags to separate instructions, context, examples, tool rules, and output contracts
- keep tag names descriptive and consistent
- do not wrap every sentence in markup

4. Make the prompt easy to execute:
- put one high-value behavior per bullet or line when the task is fragile
- prefer positive instructions over "do not do X" lists
- place tool-use rules, escalation boundaries, and stop conditions in explicit sections
- keep persona light unless it changes behavior in a useful way
- use the shortest wording that preserves the intended behavioral constraint
- cut motivational filler, repeated reminders, and examples that do not improve evals
- for long-context prompts, place evidence before the final query and keep the actual ask in a clear terminal section
- keep instructions, evidence, and schemas in distinct blocks so the model does not have to infer what is policy versus data

5. Treat examples as first-class prompt assets:
- start simple before adding examples
- add examples only when they improve format control, edge-case handling, or tool behavior
- keep examples structurally consistent
- prefer positive demonstrations over anti-pattern-only demonstrations
Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md`. Reach for `references/transformed-examples.md` when the task is under-specified or the first draft is weak.

Apply, in order:

1. Layer the prompt — stable behavior in `system`/`developer`, task-local context in templated user sections, examples as a third layer.
2. Place directives in canonical rules sections (`<behavior>`, `<tool_policy>`, `<constraints>`, `<workflow>`), not buried inside descriptive markers.
3. Keep one authoritative owner per rule. Collapse duplicates.
4. Cross-check the symptom-to-fix table in `core-patterns.md` before adding new instructions.

## Step 4: Run the meta optimization loop

Read `references/meta-optimization-loop.md`.

1. Start with the current prompt or a simple first draft.
2. Score it on a representative slice:
- at least one happy-path case
- at least one failure replay
- at least one ambiguous case
- at least one edge case
- at least one "should refuse", "should ask", or "should defer" case when relevant

3. Turn failures into explicit criticisms:
- identify what the prompt under-specified, over-specified, or contradicted
- write critiques as actionable edits, not vague complaints

4. Generate a small beam of candidate prompts:
- one minimal-diff repair
- one structure-first rewrite
- one example- or tool-rule-centered variant when that is the likely bottleneck
- one provider-specific adapter when cross-model behavior is the issue

5. Compare candidates on the same eval slice.
6. Keep the best candidate and log what changed and why.
7. Preserve the evidence for each round:
- prompt version
- eval case
- model output
- failure reason
- relevant scores

8. Test the winner on a holdout slice before finalizing.
9. Stop when scores plateau, edits oscillate, cost rises without quality gain, or the remaining issue is outside prompt control.

Keep edits minimal and causal. Record what you removed as well as what you added. If you change everything at once, you learn nothing about what actually helped.
Baseline on a representative slice → cluster failures → write critiques as concrete edits → generate a small candidate beam (minimal-diff repair, structure-first rewrite, example-or-tool-rule variant) → compare on the same slice → keep the best → validate on a holdout → stop when scores plateau, edits oscillate, or cost rises without gain.

Record what you remove as well as what you add.

## Step 5: Produce a reusable deliverable

Expand All @@ -139,24 +86,6 @@ Return:

If the user supplied an existing prompt, include a concise diff-style explanation of the biggest behavioral changes.

## Step 6: Guard against common failure modes

Read `references/transformed-examples.md` when the task is ambiguous or the first draft is weak.

Do not:

- optimize wording before defining the eval target
- mix instructions, examples, and raw context without boundaries
- keep the same rule in multiple layers unless there is a proven reason
- let stable rules drift into the user payload just because the current prompt template makes it convenient
- ask reasoning models to reveal chain-of-thought just because the task is hard
- keep contradictory legacy instructions in the same prompt
- overfit to one or two examples
- keep examples that do not improve measured behavior
- solve tool-use failures only in the system prompt when the real problem is the tool description or schema
- add markers everywhere and mistake structure for clarity
- use a bloated persona as a substitute for concrete behavior rules

## Output standard

The final prompt package should be reusable by another engineer without rediscovering:
Expand Down
8 changes: 8 additions & 0 deletions skills/prompt-optimizer/SOURCES.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
- "Port this prompt from GPT to Gemini."
- "Make this tool-using prompt more reliable."
- "Tune this prompt wording with a proper eval loop."
- "How should I expose tools in my agent's system prompt?"
- "Design tool policy for our harness prompt."
- "Stop the model from narrating 'let me check' before tool calls."
- "How should I disclose skills in the system prompt — eager or lazy?"
- "Route between two adjacent skills that keep mis-matching."
- "Split platform rules out of our customer-authored persona file."

### Should not trigger

Expand All @@ -130,6 +136,8 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco
- "Summarize this document."
- "Design a new model architecture."
- "Tune only the temperature and top-p settings."
- "Implement a new MCP server." (this is a tool/server authoring task, not a prompt task)
- "Write the SKILL.md body for a new skill." (this is a skill-authoring task — use `skill-writer`)

## Open gaps

Expand Down
11 changes: 11 additions & 0 deletions skills/prompt-optimizer/references/core-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

Use this file when creating a new prompt or restructuring a weak one.

## Contents

- When markers help
- Where rules live
- Layer the prompt correctly
- Layered prompts with multiple owners
- Portable agent prompt skeleton
- High-value prompt moves
- Examples
- Symptom to fix mapping

## When markers help

Use markers when the prompt mixes different content types:
Expand Down
7 changes: 7 additions & 0 deletions skills/prompt-optimizer/references/meta-optimization-loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

Use this file when refining an existing prompt or when a first draft needs disciplined iteration.

## Contents

- Inputs
- Optimization loop (baseline, failure clustering, textual gradients, candidate beam, compare, reflective memory, holdout validation, stop conditions)
- Practical defaults
- What this loop is borrowing from

## Inputs

Collect these before iterating:
Expand Down
15 changes: 11 additions & 4 deletions skills/prompt-optimizer/references/transformed-examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

Use these examples when the task is under-specified or when you need a stronger default shape.

## Contents

- Example 1: Happy-path new agent prompt
- Example 2: Robust variant for a weak existing prompt
- Example 3: Anti-pattern and corrected version
- Example 4: Directive placement — state marker vs. rules section

## Example 1: Happy-path new agent prompt

### Input brief
Expand Down Expand Up @@ -60,10 +67,10 @@ Default to implementation when the user's intent is execution rather than discus
Use tools to discover missing facts instead of guessing.
</default_behavior>

<tool_rules>
<tool_policy>
Use repository tools whenever correctness depends on current files, logs, or config.
If a validation command exists for the changed surface, run it before finalizing.
</tool_rules>
</tool_policy>

<progress_updates>
Send short progress updates during long tasks.
Expand Down Expand Up @@ -107,9 +114,9 @@ You are a reliable implementation agent.
Complete the user's task accurately and efficiently.
</goal>

<tool_use>
<tool_policy>
Use tools when current repository facts, logs, or external state are needed.
</tool_use>
</tool_policy>

<clarification>
Ask only when required information is missing or the action is risky.
Expand Down
Loading