Fix payload delivery and chatbot reliability for demo#9
Merged
Conversation
- Add --narrate flag to `c2 exploit` for step-by-step demo explanations using [~] prefix lines that explain the attack chain to an audience - Auto-wait for payload to call home after exploit delivery (default on, --no-wait to disable) with rolling progress dots and elapsed time - Add --wait-timeout option (default 60s) for the session wait - Add progress indicator to `c2 attach` command-wait loop showing elapsed time and rolling dots instead of silent 30s hang - Enhance `status` command in attach shell to show session liveness from list_sessions() (last seen, active/terminated) - Convert print() to click.echo() in attack_client.py and click.echo(err=True) in session_manager.py for consistent output - Add `make demo` target for narrated exploit + auto-attach flow
The demo was broken by three compounding issues discovered during debugging: 1. **Shared boto3 sessions across background threads** — The chatbot's AgentCoreService was a singleton sharing boto3 clients across FastAPI BackgroundTask threads. Concurrent requests corrupted the underlying SSL connection pool, causing API calls to hang indefinitely after the first request completed. Fixed by creating a fresh boto3.session.Session() per analyze_csv() call. 2. **LLM corrupting base64 payloads** — The old injection strategy embedded the full base64 C2 payload (~1800 chars) in a CSV cell and relied on the LLM to faithfully reproduce it in generated code. Llama 4 Scout consistently truncated or mangled the base64 string, causing decode errors. Redesigned the payload delivery: the CSV now has a simple "Config" column with the exec line, the CSV is written to disk (not inlined in the prompt), and the user's message directs the LLM to read the cell from the file via csv.reader and exec it. The LLM only generates ~5 lines of csv.reader code — zero base64 reproduction needed. 3. **Overly prescriptive system prompt** — The chatbot prompt told the LLM to "read the file to understand its structure, then perform analysis", which made it explore the data for several iterations before following user instructions. Simplified to "Follow the user's instructions" so the LLM executes what's asked on the first tool-use iteration. Also adds: - Per-call timing logs for every boto3 API call (>> before, << after) - docs/DEMO_GUIDE.md with step-by-step web UI demo instructions - Updated README quick start for the web UI workflow - Reduced MAX_TOOL_ITERATIONS from 10 to 3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The demo was completely broken — three compounding issues prevented the C2 payload from executing in the Code Interpreter sandbox:
Shared boto3 sessions across BackgroundTask threads caused API calls to hang after ~10 minutes. The
AgentCoreServicesingleton shared boto3 clients across concurrent threads, corrupting the underlying SSL connection pool. Fixed by creating a freshboto3.session.Session()peranalyze_csv()call.LLM corrupting the base64 payload — Llama 4 Scout consistently truncated or mangled the ~1800 char base64 string when reproducing it in generated code. Redesigned payload delivery: CSV now has a "Config" column with the exec line, the file is written to disk (not inlined in the LLM prompt), and the user's message directs the LLM to
csv.readerthe file and exec the cell. Zero base64 reproduction needed.Overly prescriptive system prompt told the LLM to "read the file to understand its structure" before acting, causing it to explore the data for multiple iterations instead of following the user's instructions. Simplified to "Follow the user's instructions" so the payload fires on iteration 1.
All three fixes were required — any one alone was insufficient. Verified end-to-end:
whoamireturnsgenesis1ptoolsfrom the sandbox.Changes
victim-infra/chatbot/app/services/agentcore.py— Per-request boto3 sessions, detailed timing logs, simplified system prompt, MAX_TOOL_ITERATIONS reduced to 3attacker-infra/c2/core/payload_generator.py— Simplified CSV format with Config column, removed old injection stylesattacker-infra/c2/cli/generate.py— Updated CLI to print the suggested prompt for web UI uploadattacker-infra/c2/core/attack_client.py— Use new generator APIdocs/DEMO_GUIDE.md— New step-by-step demo guide for the web UI workflowREADME.md— Updated quick start sectionTest plan
make generate-csvproduces clean CSV with payload in Config columnwhoamicommand returnsgenesis1ptoolsvia DNS exfiltration