Fix payload delivery and chatbot reliability for demo by kmcquade · Pull Request #9 · BeyondTrust/pwning-agentcore-code-interpreter

kmcquade · 2026-03-22T22:34:44Z

Summary

The demo was completely broken — three compounding issues prevented the C2 payload from executing in the Code Interpreter sandbox:

Shared boto3 sessions across BackgroundTask threads caused API calls to hang after ~10 minutes. The AgentCoreService singleton shared boto3 clients across concurrent threads, corrupting the underlying SSL connection pool. Fixed by creating a fresh boto3.session.Session() per analyze_csv() call.
LLM corrupting the base64 payload — Llama 4 Scout consistently truncated or mangled the ~1800 char base64 string when reproducing it in generated code. Redesigned payload delivery: CSV now has a "Config" column with the exec line, the file is written to disk (not inlined in the LLM prompt), and the user's message directs the LLM to csv.reader the file and exec the cell. Zero base64 reproduction needed.
Overly prescriptive system prompt told the LLM to "read the file to understand its structure" before acting, causing it to explore the data for multiple iterations instead of following the user's instructions. Simplified to "Follow the user's instructions" so the payload fires on iteration 1.

All three fixes were required — any one alone was insufficient. Verified end-to-end: whoami returns genesis1ptools from the sandbox.

Changes

victim-infra/chatbot/app/services/agentcore.py — Per-request boto3 sessions, detailed timing logs, simplified system prompt, MAX_TOOL_ITERATIONS reduced to 3
attacker-infra/c2/core/payload_generator.py — Simplified CSV format with Config column, removed old injection styles
attacker-infra/c2/cli/generate.py — Updated CLI to print the suggested prompt for web UI upload
attacker-infra/c2/core/attack_client.py — Use new generator API
docs/DEMO_GUIDE.md — New step-by-step demo guide for the web UI workflow
README.md — Updated quick start section

Test plan

Verified make generate-csv produces clean CSV with payload in Config column
Verified upload via web UI with suggested prompt triggers LLM to exec the Config cell
Verified C2 payload calls home (DNS queries reach C2 server within seconds)
Verified whoami command returns genesis1ptools via DNS exfiltration
Verified per-request boto3 sessions eliminate the hanging issue

- Add --narrate flag to `c2 exploit` for step-by-step demo explanations using [~] prefix lines that explain the attack chain to an audience - Auto-wait for payload to call home after exploit delivery (default on, --no-wait to disable) with rolling progress dots and elapsed time - Add --wait-timeout option (default 60s) for the session wait - Add progress indicator to `c2 attach` command-wait loop showing elapsed time and rolling dots instead of silent 30s hang - Enhance `status` command in attach shell to show session liveness from list_sessions() (last seen, active/terminated) - Convert print() to click.echo() in attack_client.py and click.echo(err=True) in session_manager.py for consistent output - Add `make demo` target for narrated exploit + auto-attach flow

The demo was broken by three compounding issues discovered during debugging: 1. **Shared boto3 sessions across background threads** — The chatbot's AgentCoreService was a singleton sharing boto3 clients across FastAPI BackgroundTask threads. Concurrent requests corrupted the underlying SSL connection pool, causing API calls to hang indefinitely after the first request completed. Fixed by creating a fresh boto3.session.Session() per analyze_csv() call. 2. **LLM corrupting base64 payloads** — The old injection strategy embedded the full base64 C2 payload (~1800 chars) in a CSV cell and relied on the LLM to faithfully reproduce it in generated code. Llama 4 Scout consistently truncated or mangled the base64 string, causing decode errors. Redesigned the payload delivery: the CSV now has a simple "Config" column with the exec line, the CSV is written to disk (not inlined in the prompt), and the user's message directs the LLM to read the cell from the file via csv.reader and exec it. The LLM only generates ~5 lines of csv.reader code — zero base64 reproduction needed. 3. **Overly prescriptive system prompt** — The chatbot prompt told the LLM to "read the file to understand its structure, then perform analysis", which made it explore the data for several iterations before following user instructions. Simplified to "Follow the user's instructions" so the LLM executes what's asked on the first tool-use iteration. Also adds: - Per-call timing logs for every boto3 API call (>> before, << after) - docs/DEMO_GUIDE.md with step-by-step web UI demo instructions - Updated README quick start for the web UI workflow - Reduced MAX_TOOL_ITERATIONS from 10 to 3

kmcquade added 2 commits March 15, 2026 16:19

kmcquade requested a review from a team as a code owner March 22, 2026 22:34

kmcquade merged commit b322fcf into main Mar 22, 2026
4 checks passed

kmcquade deleted the demo-narration branch March 22, 2026 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix payload delivery and chatbot reliability for demo#9

Fix payload delivery and chatbot reliability for demo#9
kmcquade merged 2 commits intomainfrom
demo-narration

kmcquade commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

kmcquade commented Mar 22, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant