diff --git a/strix/agents/StrixAgent/system_prompt.jinja b/strix/agents/StrixAgent/system_prompt.jinja index 36c88508c..be26f5a5e 100644 --- a/strix/agents/StrixAgent/system_prompt.jinja +++ b/strix/agents/StrixAgent/system_prompt.jinja @@ -107,6 +107,7 @@ OPERATIONAL PRINCIPLES: - Chain vulnerabilities for maximum impact - Consider business logic and context in exploitation - NEVER skip think tool - it's your most important tool for reasoning and success +- SKILL TOOLS (need-based, applies to all agents): list_skills—MUST call before create_agent(skills=...) or before load_skills when you need valid skill names. Do NOT call at task start. load_skills—MUST call before testing any vulnerability type or technology NOT already in your section. If expertise is already there, proceed. create_agent—when spawning testing/exploitation agents, always pass skills param; call list_skills first to get valid names; match skills to recon findings. - WORK RELENTLESSLY - Don't stop until you've found something significant - Try multiple approaches simultaneously - don't wait for one to fail - Continuously research payloads, bypasses, and exploitation techniques with the web_search tool; integrate findings into automated sprays and validation @@ -167,6 +168,8 @@ You have access to comprehensive guides for each vulnerability type above. Use t - Tool usage and custom scripts - Post-exploitation strategies +LOAD SKILLS BEFORE TESTING: Check your section. If the vulnerability type or technology is already there, proceed. If NOT, call list_skills (to get valid names) then load_skills BEFORE running tests. Never test without the relevant expertise loaded. + BUG BOUNTY MINDSET: - Think like a bug bounty hunter - only report what would earn rewards - One critical vulnerability > 100 informational findings @@ -187,6 +190,7 @@ AGENT ISOLATION & SANDBOXING: MANDATORY INITIAL PHASES: BLACK-BOX TESTING - PHASE 1 (RECON & MAPPING): +When spawning testing agents: call list_skills first to get valid skill names, then create_agent with skills param matching recon findings. - COMPLETE full reconnaissance: subdomain enumeration, port scanning, service detection - MAP entire attack surface: all endpoints, parameters, APIs, forms, inputs - CRAWL thoroughly: spider all pages (authenticated and unauthenticated), discover hidden paths, analyze JS files @@ -201,6 +205,15 @@ WHITE-BOX TESTING - PHASE 1 (CODE UNDERSTANDING): - REVIEW dependencies and third-party libraries - ONLY AFTER full code comprehension → proceed to vulnerability testing +SKILL MATCHING (recon findings → create_agent skills param): +- Forms, inputs, search, filters → sql_injection, xss +- Login, auth, JWT, sessions → authentication_jwt, business_logic +- GraphQL API → graphql +- FastAPI, Next.js, etc. → fastapi, nextjs +- Firebase, Supabase → firebase_firestore, supabase +- File uploads, path params → path_traversal_lfi_rfi, insecure_file_uploads +- SSRF, XXE, RCE, IDOR, CSRF, etc. → use skill name matching the vulnerability type + PHASE 2 - SYSTEMATIC VULNERABILITY TESTING: - CREATE SPECIALIZED SUBAGENT for EACH vulnerability type × EACH component - Each agent focuses on ONE vulnerability type in ONE specific location @@ -208,7 +221,7 @@ PHASE 2 - SYSTEMATIC VULNERABILITY TESTING: SIMPLE WORKFLOW RULES: -1. **ALWAYS CREATE AGENTS IN TREES** - Never work alone, always spawn subagents +1. **ALWAYS CREATE AGENTS IN TREES** - Never work alone, always spawn subagents. Before create_agent(skills=...), call list_skills to get valid names. 2. **BLACK-BOX**: Discovery → Validation → Reporting (3 agents per vulnerability) 3. **WHITE-BOX**: Discovery → Validation → Reporting → Fixing (4 agents per vulnerability) 4. **MULTIPLE VULNS = MULTIPLE CHAINS** - Each vulnerability finding gets its own validation chain @@ -265,6 +278,7 @@ CRITICAL RULES: - **SPAWN REACTIVELY** - Create new agents based on what you discover - **ONLY REPORTING AGENTS** can use create_vulnerability_report tool - **AGENT SPECIALIZATION MANDATORY** - Each agent must be highly specialized; prefer 1–3 skills, up to 5 for complex contexts +- **SKILL ADAPTATION**: Check before testing. If expertise is missing, call list_skills then load_skills. If you orchestrate only (spawn agents, don't test), you never need load_skills. - **NO GENERIC AGENTS** - Avoid creating broad, multi-purpose agents that dilute focus AGENT SPECIALIZATION EXAMPLES: @@ -352,6 +366,15 @@ Example (agent creation tool): xss +Example (call list_skills before create_agent(skills=...) or load_skills): + + + +Example (call load_skills before testing a type not in ): + +sql_injection + + SPRAYING EXECUTION NOTE: - When performing large payload sprays or fuzzing, encapsulate the entire spraying loop inside a single python or terminal tool call (e.g., a Python script using asyncio/aiohttp). Do not issue one tool call per payload. - Favor batch-mode CLI tools (sqlmap, ffuf, nuclei, zaproxy, arjun) where appropriate and check traffic via the proxy when beneficial diff --git a/strix/tools/__init__.py b/strix/tools/__init__.py index 1c49472f0..7cb7f878c 100644 --- a/strix/tools/__init__.py +++ b/strix/tools/__init__.py @@ -39,6 +39,7 @@ from .proxy import * # noqa: F403 from .python import * # noqa: F403 from .reporting import * # noqa: F403 + from .skills import * # noqa: F403 from .terminal import * # noqa: F403 from .thinking import * # noqa: F403 from .todo import * # noqa: F403 diff --git a/strix/tools/skills/__init__.py b/strix/tools/skills/__init__.py new file mode 100644 index 000000000..8a66a2fb6 --- /dev/null +++ b/strix/tools/skills/__init__.py @@ -0,0 +1,7 @@ +from .skills_actions import list_skills, load_skills + + +__all__ = [ + "list_skills", + "load_skills", +] diff --git a/strix/tools/skills/skills_actions.py b/strix/tools/skills/skills_actions.py new file mode 100644 index 000000000..d6cb4f9ae --- /dev/null +++ b/strix/tools/skills/skills_actions.py @@ -0,0 +1,195 @@ +import re +from typing import Any + +from strix.skills import ( + get_all_skill_names, + get_available_skills, + validate_skill_names, +) +from strix.skills import ( + load_skills as load_skills_content, +) +from strix.tools.registry import register_tool +from strix.utils.resource_paths import get_strix_resource_path + + +_FRONTMATTER_PATTERN = re.compile(r"^---\s*\n(.*?)\n---\s*\n", re.DOTALL) +_NAME_PATTERN = re.compile(r"^name:\s*(.+)$", re.MULTILINE) +_DESCRIPTION_PATTERN = re.compile(r"^description:\s*(.+)$", re.MULTILINE) + + +def _extract_frontmatter(content: str) -> dict[str, str] | None: + """Extract name and description from YAML frontmatter using simple regex parsing.""" + match = _FRONTMATTER_PATTERN.match(content) + if not match: + return None + + frontmatter_text = match.group(1) + metadata: dict[str, str] = {} + + name_match = _NAME_PATTERN.search(frontmatter_text) + if name_match: + metadata["name"] = name_match.group(1).strip() + + desc_match = _DESCRIPTION_PATTERN.search(frontmatter_text) + if desc_match: + metadata["description"] = desc_match.group(1).strip() + + return metadata if metadata else None + + +def _get_skill_metadata(skill_name: str, category: str | None = None) -> dict[str, str] | None: + """Get metadata (name, description) for a specific skill by reading its file.""" + skills_dir = get_strix_resource_path("skills") + + if category: + skill_path = skills_dir / category / f"{skill_name}.md" + else: + all_categories = get_available_skills() + skill_path = None + for cat, skills in all_categories.items(): + if skill_name in skills: + skill_path = skills_dir / cat / f"{skill_name}.md" + break + + if not skill_path: + root_candidate = skills_dir / f"{skill_name}.md" + if root_candidate.exists(): + skill_path = root_candidate + + if skill_path and skill_path.exists(): + try: + content = skill_path.read_text() + return _extract_frontmatter(content) + except (FileNotFoundError, OSError): + return None + + return None + + +@register_tool(sandbox_execution=False) +def list_skills( + category: str | None = None, +) -> dict[str, Any]: + """List available skills, optionally filtered by category. + + Always includes name and description metadata for each skill. + """ + try: + available_skills = get_available_skills() + + if category: + if category in available_skills: + filtered_skills = {category: available_skills[category]} + else: + categories_str = ", ".join(sorted(available_skills.keys())) + return { + "success": False, + "error": ( + f"Category '{category}' not found. Available categories: {categories_str}" + ), + "skills_by_category": {}, + "all_skills": [], + "categories": list(available_skills.keys()), + "metadata": {}, + } + else: + filtered_skills = available_skills + + metadata: dict[str, dict[str, str]] = {} + for cat, skills in filtered_skills.items(): + for skill_name in skills: + skill_meta = _get_skill_metadata(skill_name, cat) + if skill_meta: + metadata[skill_name] = skill_meta + + all_filtered_skills: list[str] = [] + for skills in filtered_skills.values(): + all_filtered_skills.extend(skills) + all_filtered_skills = sorted(set(all_filtered_skills)) + + return { + "success": True, + "skills_by_category": filtered_skills, + "all_skills": all_filtered_skills + if not category + else filtered_skills.get(category, []), + "categories": list(available_skills.keys()), + "metadata": metadata, + } + except Exception as e: # noqa: BLE001 + return { + "success": False, + "error": f"Failed to list skills: {e}", + "skills_by_category": {}, + "all_skills": [], + "categories": [], + "metadata": {}, + } + + +@register_tool(sandbox_execution=False) +def load_skills( + agent_state: Any, + skills: str, +) -> dict[str, Any]: + """Load skill content dynamically at runtime for immediate use.""" + try: + skill_list = [s.strip() for s in skills.split(",") if s.strip()] + + if not skill_list: + return { + "success": False, + "error": "No skills specified. Provide comma-separated skill names.", + "loaded_skills": {}, + "loaded_count": 0, + "invalid_skills": [], + "warnings": [], + } + + def _bare_name(s: str) -> str: + return s.split("/")[-1] + + validation = validate_skill_names([_bare_name(s) for s in skill_list]) + invalid_bare = set(validation.get("invalid", [])) + valid_skills = [s for s in skill_list if _bare_name(s) not in invalid_bare] + invalid_skills = [s for s in skill_list if _bare_name(s) in invalid_bare] + + warnings: list[str] = [] + if invalid_skills: + available_skills = list(get_all_skill_names()) + warnings.append( + f"Invalid skills: {', '.join(invalid_skills)}. " + f"Available skills: {', '.join(sorted(available_skills))}" + ) + + loaded_content = load_skills_content(valid_skills) + + loaded_skill_names = set(loaded_content.keys()) + requested_skill_names = {s.split("/")[-1] for s in valid_skills} + missing_skills = requested_skill_names - loaded_skill_names + if missing_skills: + warnings.append(f"Some skills could not be loaded: {', '.join(missing_skills)}") + + result: dict[str, Any] = { + "success": len(loaded_content) > 0, + "loaded_skills": loaded_content, + "loaded_count": len(loaded_content), + "invalid_skills": invalid_skills, + "warnings": warnings, + } + if not result["success"] and not loaded_content: + result["error"] = ( + "No skills could be loaded. " + + (warnings[0] if warnings else "Check skill names with list_skills.") + ) + return result + except Exception as e: # noqa: BLE001 + return { + "success": False, + "error": f"Failed to load skills: {e}", + "loaded_skills": {}, + "loaded_count": 0, + "invalid_skills": [], + "warnings": [], + } diff --git a/strix/tools/skills/skills_actions_schema.xml b/strix/tools/skills/skills_actions_schema.xml new file mode 100644 index 000000000..feea6f10d --- /dev/null +++ b/strix/tools/skills/skills_actions_schema.xml @@ -0,0 +1,130 @@ + + + Discover available skills that can be loaded dynamically at runtime. + +This tool helps you find specialized knowledge packages when you encounter unfamiliar technologies, vulnerability types, or frameworks during testing. Use this BEFORE load_skills to discover what expertise is available. + +Use this tool when: +- You encounter a technology/framework you're not specialized in (e.g., FastAPI, Next.js, Firebase) +- You need to test a vulnerability type not in your initial skills (e.g., XXE, SSRF, GraphQL) +- You want to explore available expertise before starting a new testing phase +- You need to find skills related to a specific category (vulnerabilities, frameworks, technologies, protocols, cloud) +
Skills are specialized knowledge packages that provide deep expertise in specific areas: +- **Vulnerabilities**: Advanced testing techniques for vulnerability classes (SQL injection, XSS, SSRF, etc.) +- **Frameworks**: Framework-specific testing methods (Django, Express, FastAPI, Next.js) +- **Technologies**: Third-party service testing (Supabase, Firebase, Auth0, payment gateways) +- **Protocols**: Protocol-specific patterns (GraphQL, WebSocket, OAuth) +- **Cloud**: Cloud provider security testing (AWS, Azure, GCP, Kubernetes) + +This tool returns a structured list of all available skills, optionally filtered by category, and always includes metadata (name, description) from skill files.
+ + + Filter skills by category. Available categories: vulnerabilities, frameworks, technologies, protocols, cloud, reconnaissance, custom. If not specified, returns all skills across all categories. + + + + Response containing: - success: Whether the operation succeeded - skills_by_category: Dictionary mapping category names to lists of skill names - all_skills: Flat list of all skill names (or filtered list if category specified) - categories: List of all available category names - metadata: Dictionary mapping skill names to their metadata (name, description) - always included + + + # Discover all available skills with names and descriptions + + + + # Find skills for a specific vulnerability type + + vulnerabilities + + + # Explore framework-specific skills + + frameworks + + + # Check what technologies have specialized testing knowledge + + technologies + + +
+ + Load specialized skill content dynamically at runtime for immediate use. + +This tool loads skill content (markdown knowledge packages) that you can immediately use in your testing. Load skills RIGHT BEFORE you need them - when you encounter a new vulnerability type, technology, or framework you're not specialized in. + +**CRITICAL WORKFLOW**: +1. Use list_skills to discover available expertise +2. Use load_skills to pull in the specialized knowledge +3. Immediately apply the loaded knowledge to your testing + +The loaded skill content provides: +- Advanced techniques and methodologies +- Practical payloads and examples +- Validation methods and testing approaches +- Tool usage guidance +- Bypass techniques and edge cases + +Use this tool when: +- You're about to test a vulnerability type not in your initial skills (e.g., load sql_injection before SQLi testing) +- You encounter an unfamiliar framework (e.g., load fastapi when testing FastAPI apps) +- You need specialized knowledge for a technology (e.g., load firebase_firestore for Firebase testing) +- You discover a new attack surface requiring different expertise (e.g., load graphql for GraphQL API testing) +
This tool enables runtime skill adaptation - you're not limited to your initial skill set. When you load a skill, you receive its complete content (markdown) that you can immediately reference and use in your testing approach. + +The skill content is returned as a dictionary where keys are skill names and values are the full markdown content. You can use this content to: +- Understand advanced testing techniques +- Get specific payloads and examples +- Learn validation methods +- Discover tool usage patterns +- Find bypass techniques + +Skills are loaded from the skills directory and validated before loading. Invalid skill names are reported but don't prevent valid skills from loading.
+ + + Comma-separated list of skill names to load. You can specify skills by name only (e.g., "sql_injection") or with category path (e.g., "vulnerabilities/sql_injection"). Use list_skills first to discover available skill names. Examples: "sql_injection", "xss,csrf", "fastapi", "vulnerabilities/ssrf,technologies/firebase_firestore" + + + + Response containing: - success: Whether at least one skill was loaded successfully - loaded_skills: Dictionary mapping skill names to their markdown content - loaded_count: Number of skills successfully loaded - invalid_skills: List of skill names that were invalid/not found - warnings: List of warning messages (e.g., invalid skills, missing files) + + + # Load SQL injection expertise before testing + + vulnerabilities + + + + sql_injection + + + # Now proceed with SQL injection testing using the loaded expertise + + # Load framework-specific knowledge when encountering new tech + + fastapi + + + # Use the FastAPI-specific testing techniques from the loaded skill + + # Load multiple related skills for comprehensive testing + + authentication_jwt,business_logic + + + # Test authentication and business logic vulnerabilities with loaded expertise + + # Load protocol-specific skills for API testing + + graphql + + + # Apply GraphQL-specific testing techniques + + # Load technology-specific skills + + firebase_firestore + + + # Test Firebase Firestore security with specialized knowledge + +
+