feat(sight): add Codex CLI adaptation and cross-chunk SSE recovery#1133
Conversation
Codex CLI links aws-lc statically (BoringSSL-compatible) and is shipped as a stripped musl static-pie binary, so AgentSight could not attach any uprobe. Add a three-tier fallback (symbol / byte-pattern / offset table), handle the SSL_*_ex ABI variants, and recover token usage when codex's oversized response.completed event spans multiple TLS records. Also fix codex's session ID by extending ResponseSessionMapper to parse its rollout-<ts>-<UUID>.jsonl filenames (gated by BPF on tgid pid for cross-probe correlation), and classify Codex without listening ports as AgentRole::Client so the healthcheck dashboard hides idle instances. Includes the extract-codex-offsets.py helper plus a doc covering the official symbols package (0.140+), self-built non-stripped binary (0.139-), and fingerprint workflow. Closes alibaba#1036
071056f to
8022b8a
Compare
Add tests for SSE continuation buffer, Responses API parsing, and role inference paths from the Codex CLI adaptation. Relates to alibaba#1036
74705c2 to
2412714
Compare
AgentSight Code Review — PR #1133变更规模: +2285 / -488, 24 files, 3 commits 按 6 维度审查(硬性规则合规、eBPF 安全、FFI 边界、Footprint Ladder、流水线测试覆盖、文档同步),发现以下问题: 1. [硬性规则] PR diff 总行数超限总计 2773 行变更,远超 AGENTS.md 规定的 800 行上限。建议拆分为 3-4 个独立 PR:
拆分后每个 PR 可独立 review、revert 和 bisect。 2. [eBPF 安全]
|
Narrow path matching, revert log level to trace, chunked binary read, ELF64 doc, trailing newline, BPF comment, and continuation buffer tests. Relates to alibaba#1036
Add comment explaining /proc/pid/root uprobe safety, fix detach_process inode cleanup to avoid duplicate attach, and sync AGENTS.md with Codex adaptation docs. Relates to alibaba#1036
|
感谢详细的 review!以下是逐条回复: #1 PR diff 超限同意行数较多,但本次变更构成一个完整的 Codex CLI 适配功能(offset 表 + BPF 扩展 + SSE continuation buffer + genai 集成),拆分后各部分无法独立验证端到端效果,因此保持单 PR。 #2 filewrite.bpf.c rollout 匹配过宽已补充注释说明设计意图:BPF 层使用前缀匹配( #3 移除 canonicalize 可能影响容器场景
#4 detach_process inode 清理可能导致重复 attach已修复。 #5 删除 drained_sse_events 可能回归 #973确认非回归。#973 的修复逻辑(压缩 SSE 流在进程死亡时的恢复)已迁移到 aggregator 层:
#6 extract-codex-offsets.py 全量读取二进制已改为 1 MiB 分块读取,避免 276 MB 二进制导致 OOM。 #7 needs_sse_continuation_buffer 路径匹配过宽已收窄:删除 #8 continuation buffer 缺少边界测试已补充 3 个测试:
#9 文档同步已更新 AGENTS.md:Module Map 补充 #10 elf_buildid.rs 仅支持 ELF64已补充 doc comment 说明 ELF64 限制及原因(Codex CLI 为 x86-64 musl 静态二进制)。 #11 日志级别 trace→debug已改回 #12 agentsight.json 缺尾部换行已补 所有修改已推送(commit |
Re-Review: PR #1133 (v2)针对上一轮 review 的 12 条 findings,逐一验证修复情况: ✅ 已修复(10/12)
|
Drain path now decodes compressed_buffer when sse_events is empty, preventing token usage loss on process exit. Relates to alibaba#1036
|
感谢 re-review! #5 drained_sse_events 删除可能回归 #973经确认,这确实是潜在回归。已修复(commit 问题根因:原 修复方式:在 drain 路径的 SseActive 解构中新增一个 match arm,当 ConnectionState::SseActive {
request: Some(req),
sse_events,
compressed_buffer: Some(buf),
content_encoding,
response_headers,
} if sse_events.is_empty() && !buf.is_empty() => {
// fix(#973): decode the unfinalized compressed buffer
let is_chunked = HttpConnectionAggregator::is_chunked_response(&response_headers);
let decoded = HttpConnectionAggregator::decode_compressed_sse(
&buf, content_encoding.as_deref(), is_chunked, &response_headers.source_event,
);
("SseActive", req, decoded)
}这与原始 #13 still_used 检查复杂度同意当前 O(n×m) 在大规模场景下需优化。当前 全部 12 条 findings 已处理,941 tests 通过、clippy/fmt/arch boundary 均通过。 |
Re-Review v3: PR #1133验证上轮 2 条待确认项 + 1 条新发现的修复情况: ✅ #5 drained_sse_events / #973 — 逻辑已恢复commit ConnectionState::SseActive {
request: Some(req),
sse_events,
compressed_buffer: Some(buf),
content_encoding,
response_headers,
} if sse_events.is_empty() && !buf.is_empty() => {
// fix(#973): decode the unfinalized compressed buffer
let decoded = HttpConnectionAggregator::decode_compressed_sse(...);
("SseActive", req, decoded)
}使用 guard 条件精确匹配"events 为空 + buffer 非空"的场景,逻辑等价于原来的 Nit: 原来的 3 个单元测试( ✅ #1 PR 行数 — 已知悉+2476/-489,作者选择作为完整特性提交,理解。 ℹ️ #13 detach_process still_used 复杂度 — 无变化仍为 O(n×m) 遍历,当前规模可接受,作为后续优化项记录。 最终评价全部 12 条原始 findings + 1 条新增 finding 均已妥善处理。代码质量高,review 响应迅速。 LGTM 🚀 |
chengshuyi
left a comment
There was a problem hiding this comment.
LGTM. All 13 review findings addressed. #973 drain decode restored.
Description
Add Codex CLI adaptation and cross-chunk SSE recovery for AgentSight.
Codex CLI links aws-lc statically (BoringSSL-compatible) and is shipped as a stripped musl static-pie binary, so AgentSight could not attach any uprobe. This change adds a three-tier fallback (symbol / byte-pattern / offset table), handles the SSL_*_ex ABI variants, and recovers token usage when codex's oversized response.completed event spans multiple TLS records.
Also fixes codex's session ID by extending ResponseSessionMapper to parse its rollout--.jsonl filenames (gated by BPF on tgid pid for cross-probe correlation), and classifies Codex without listening ports as AgentRole::Client so the healthcheck dashboard hides idle instances.
Includes the extract-codex-offsets.py helper plus a doc covering the official symbols package (0.140+), self-built non-stripped binary (0.139-), and fingerprint workflow.
Related Issue
closes #1036
Type of Change
Scope
sight(agentsight)Checklist
sight:cargo clippy -- -D warningsandcargo fmt --checkpassCargo.lock)Testing
cargo fmt --check,cargo clippy -- -D warnings, andcargo testall pass insrc/agentsight.