diff --git a/skills/prompt-optimizer/SKILL.md b/skills/prompt-optimizer/SKILL.md index c6788f9..79b0b62 100644 --- a/skills/prompt-optimizer/SKILL.md +++ b/skills/prompt-optimizer/SKILL.md @@ -1,6 +1,6 @@ --- name: prompt-optimizer -description: Create, optimize, and iteratively refine agent prompts and system prompts. Use when asked to "improve a prompt", "optimize a system prompt", "rewrite an agent prompt", "tune prompt wording", "make this prompt more reliable", or "adapt a prompt for OpenAI, Claude, or Gemini". Handles model-specific prompt guidance, prompt markers/tags, eval design, and meta optimization loops for new and existing prompts. +description: Create, optimize, and iteratively refine agent prompts and system prompts. Use when asked to "improve a prompt", "optimize a system prompt", "rewrite an agent prompt", "tune prompt wording", "make this prompt more reliable", "adapt a prompt for OpenAI, Claude, or Gemini", "design tool policy for an agent prompt", "how should I expose tools in a prompt", or "how should I disclose skills in an agent". Handles model-specific prompt guidance, prompt markers/tags, tool disclosure and tool-call narration, skill disclosure and routing, layered platform/deployer prompts, eval design, and meta optimization loops. --- # Prompt Optimizer @@ -55,75 +55,22 @@ Read `references/model-family-notes.md`. ## Step 3: Shape the prompt deliberately -Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md` respectively. - -1. Separate durable behavior from task-local context: -- stable policy and behavioral defaults belong in `system` or `developer` -- variable inputs, retrieved context, and task instances belong in templated user-facing sections -- when the system prompt is assembled at runtime from a platform layer and a deployer-authored persona layer (e.g., `SOUL.md`, `CLAUDE.md`, `AGENTS.md`), see "Layered prompts with multiple owners" in `references/core-patterns.md` — platform behavior rules must not depend on what the deployer layer contains - -2. Keep one authoritative instruction per behavior: -- if a rule appears in more than one layer, choose one owner for it -- stable cross-task rules belong in `system` or `developer` -- examples should teach format, edge-case handling, or tool behavior, not restate the whole policy -- user payloads should carry task-local facts, not durable policy - -3. Use markers only when they reduce ambiguity: -- use markdown headings or XML-style tags to separate instructions, context, examples, tool rules, and output contracts -- keep tag names descriptive and consistent -- do not wrap every sentence in markup - -4. Make the prompt easy to execute: -- put one high-value behavior per bullet or line when the task is fragile -- prefer positive instructions over "do not do X" lists -- place tool-use rules, escalation boundaries, and stop conditions in explicit sections -- keep persona light unless it changes behavior in a useful way -- use the shortest wording that preserves the intended behavioral constraint -- cut motivational filler, repeated reminders, and examples that do not improve evals -- for long-context prompts, place evidence before the final query and keep the actual ask in a clear terminal section -- keep instructions, evidence, and schemas in distinct blocks so the model does not have to infer what is policy versus data - -5. Treat examples as first-class prompt assets: -- start simple before adding examples -- add examples only when they improve format control, edge-case handling, or tool behavior -- keep examples structurally consistent -- prefer positive demonstrations over anti-pattern-only demonstrations +Read `references/core-patterns.md`. When the prompt surface includes tools or a skill layer, also read `references/tools.md` or `references/skills.md`. Reach for `references/transformed-examples.md` when the task is under-specified or the first draft is weak. + +Apply, in order: + +1. Layer the prompt — stable behavior in `system`/`developer`, task-local context in templated user sections, examples as a third layer. +2. Place directives in canonical rules sections (``, ``, ``, ``), not buried inside descriptive markers. +3. Keep one authoritative owner per rule. Collapse duplicates. +4. Cross-check the symptom-to-fix table in `core-patterns.md` before adding new instructions. ## Step 4: Run the meta optimization loop Read `references/meta-optimization-loop.md`. -1. Start with the current prompt or a simple first draft. -2. Score it on a representative slice: -- at least one happy-path case -- at least one failure replay -- at least one ambiguous case -- at least one edge case -- at least one "should refuse", "should ask", or "should defer" case when relevant - -3. Turn failures into explicit criticisms: -- identify what the prompt under-specified, over-specified, or contradicted -- write critiques as actionable edits, not vague complaints - -4. Generate a small beam of candidate prompts: -- one minimal-diff repair -- one structure-first rewrite -- one example- or tool-rule-centered variant when that is the likely bottleneck -- one provider-specific adapter when cross-model behavior is the issue - -5. Compare candidates on the same eval slice. -6. Keep the best candidate and log what changed and why. -7. Preserve the evidence for each round: -- prompt version -- eval case -- model output -- failure reason -- relevant scores - -8. Test the winner on a holdout slice before finalizing. -9. Stop when scores plateau, edits oscillate, cost rises without quality gain, or the remaining issue is outside prompt control. - -Keep edits minimal and causal. Record what you removed as well as what you added. If you change everything at once, you learn nothing about what actually helped. +Baseline on a representative slice → cluster failures → write critiques as concrete edits → generate a small candidate beam (minimal-diff repair, structure-first rewrite, example-or-tool-rule variant) → compare on the same slice → keep the best → validate on a holdout → stop when scores plateau, edits oscillate, or cost rises without gain. + +Record what you remove as well as what you add. ## Step 5: Produce a reusable deliverable @@ -139,24 +86,6 @@ Return: If the user supplied an existing prompt, include a concise diff-style explanation of the biggest behavioral changes. -## Step 6: Guard against common failure modes - -Read `references/transformed-examples.md` when the task is ambiguous or the first draft is weak. - -Do not: - -- optimize wording before defining the eval target -- mix instructions, examples, and raw context without boundaries -- keep the same rule in multiple layers unless there is a proven reason -- let stable rules drift into the user payload just because the current prompt template makes it convenient -- ask reasoning models to reveal chain-of-thought just because the task is hard -- keep contradictory legacy instructions in the same prompt -- overfit to one or two examples -- keep examples that do not improve measured behavior -- solve tool-use failures only in the system prompt when the real problem is the tool description or schema -- add markers everywhere and mistake structure for clarity -- use a bloated persona as a substitute for concrete behavior rules - ## Output standard The final prompt package should be reusable by another engineer without rediscovering: diff --git a/skills/prompt-optimizer/SOURCES.md b/skills/prompt-optimizer/SOURCES.md index 3bc7249..1d29139 100644 --- a/skills/prompt-optimizer/SOURCES.md +++ b/skills/prompt-optimizer/SOURCES.md @@ -121,6 +121,12 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco - "Port this prompt from GPT to Gemini." - "Make this tool-using prompt more reliable." - "Tune this prompt wording with a proper eval loop." +- "How should I expose tools in my agent's system prompt?" +- "Design tool policy for our harness prompt." +- "Stop the model from narrating 'let me check' before tool calls." +- "How should I disclose skills in the system prompt — eager or lazy?" +- "Route between two adjacent skills that keep mis-matching." +- "Split platform rules out of our customer-authored persona file." ### Should not trigger @@ -130,6 +136,8 @@ Why: this skill is a repeatable prompt-optimization workflow with explicit preco - "Summarize this document." - "Design a new model architecture." - "Tune only the temperature and top-p settings." +- "Implement a new MCP server." (this is a tool/server authoring task, not a prompt task) +- "Write the SKILL.md body for a new skill." (this is a skill-authoring task — use `skill-writer`) ## Open gaps diff --git a/skills/prompt-optimizer/references/core-patterns.md b/skills/prompt-optimizer/references/core-patterns.md index 4e1606e..e71cc6d 100644 --- a/skills/prompt-optimizer/references/core-patterns.md +++ b/skills/prompt-optimizer/references/core-patterns.md @@ -2,6 +2,17 @@ Use this file when creating a new prompt or restructuring a weak one. +## Contents + +- When markers help +- Where rules live +- Layer the prompt correctly +- Layered prompts with multiple owners +- Portable agent prompt skeleton +- High-value prompt moves +- Examples +- Symptom to fix mapping + ## When markers help Use markers when the prompt mixes different content types: diff --git a/skills/prompt-optimizer/references/meta-optimization-loop.md b/skills/prompt-optimizer/references/meta-optimization-loop.md index 6aa685c..08c1059 100644 --- a/skills/prompt-optimizer/references/meta-optimization-loop.md +++ b/skills/prompt-optimizer/references/meta-optimization-loop.md @@ -2,6 +2,13 @@ Use this file when refining an existing prompt or when a first draft needs disciplined iteration. +## Contents + +- Inputs +- Optimization loop (baseline, failure clustering, textual gradients, candidate beam, compare, reflective memory, holdout validation, stop conditions) +- Practical defaults +- What this loop is borrowing from + ## Inputs Collect these before iterating: diff --git a/skills/prompt-optimizer/references/transformed-examples.md b/skills/prompt-optimizer/references/transformed-examples.md index 807efb9..1706ffc 100644 --- a/skills/prompt-optimizer/references/transformed-examples.md +++ b/skills/prompt-optimizer/references/transformed-examples.md @@ -2,6 +2,13 @@ Use these examples when the task is under-specified or when you need a stronger default shape. +## Contents + +- Example 1: Happy-path new agent prompt +- Example 2: Robust variant for a weak existing prompt +- Example 3: Anti-pattern and corrected version +- Example 4: Directive placement — state marker vs. rules section + ## Example 1: Happy-path new agent prompt ### Input brief @@ -60,10 +67,10 @@ Default to implementation when the user's intent is execution rather than discus Use tools to discover missing facts instead of guessing. - + Use repository tools whenever correctness depends on current files, logs, or config. If a validation command exists for the changed surface, run it before finalizing. - + Send short progress updates during long tasks. @@ -107,9 +114,9 @@ You are a reliable implementation agent. Complete the user's task accurately and efficiently. - + Use tools when current repository facts, logs, or external state are needed. - + Ask only when required information is missing or the action is risky.