Skip to content

feat(sight): opt-in timer_slack optimization for agent wakeup latency#725

Open
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:feat/timer-slack-optimize
Open

feat(sight): opt-in timer_slack optimization for agent wakeup latency#725
jfeng18 wants to merge 2 commits into
alibaba:mainfrom
jfeng18:feat/timer-slack-optimize

Conversation

@jfeng18

@jfeng18 jfeng18 commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds --optimize-timer-slack flag that writes timer_slack_ns=1 to /proc/<pid>/timerslack_ns for traced agent processes, reducing the kernel hrtimer coalescing window from 50us (default) to 1ns.

Profiling evidence (ECS 2-core, bpftrace)

Metric Default (50us slack) Optimized (1ns slack)
p50 wakeup ~1.5us ~1.5us
Tail (>512us) 36% of samples 15% of samples

The optimization also tightens select/poll/epoll_wait timeouts (select_estimate_accuracy() in kernel uses timer_slack_ns), directly benefiting agent network I/O.

Design

  • Opt-in only: default off, requires --optimize-timer-slack CLI flag
  • Mechanism: writes to /proc/<pid>/timerslack_ns (requires CAP_SYS_NICE, which root has)
  • Scope: applies to all attach paths (cmdline match, DNS domain, connection scan)
  • Safety: TOCTOU handled gracefully (fs::write fails with ESRCH if process exits), errors logged at debug level

Kernel cross-reference (cloud-kernel 6.6)

  • hrtimer.c:2338: hrtimer_set_expires_range_ns(timer, rqtp, current->timer_slack_ns)
  • proc/base.c:2642: writing 0 resets to default; writing 1 is minimum effective value
  • select.c:80: select_estimate_accuracy() uses timer_slack_ns for poll/select/epoll

Test plan

  • 461 unit tests pass
  • Workflow adversarial review: PASS (kernel code cross-referenced)
  • ECS E2E: verified timer_slack_ns changes from 50000 to 1 on traced process
  • ECS profiling: bpftrace confirms wakeup tail reduction (~2x)
  • Reviewer sign-off

@jfeng18 jfeng18 requested a review from chengshuyi as a code owner June 4, 2026 22:37
@github-actions github-actions Bot added the component:sight src/agentsight/ label Jun 4, 2026
@jfeng18 jfeng18 force-pushed the feat/timer-slack-optimize branch 4 times, most recently from 73a94e9 to dee99de Compare June 6, 2026 13:24
jfeng18 and others added 2 commits June 10, 2026 11:10
Adds --optimize-timer-slack flag that writes timer_slack_ns=1 to
/proc/<pid>/timerslack_ns for traced agent processes. This reduces
the kernel's hrtimer coalescing window from 50us to 1ns.

Profiling data (ECS 2-core, bpftrace sched_wakeup→sched_switch):
- Default: p90 tail at 512-2K us (36% of samples >512us)
- With timer_slack=1: tail halved (15% of samples >512us)

The optimization also tightens select/poll/epoll_wait timeouts,
which benefits agents doing network I/O.

Opt-in only (default off) — this crosses the observation/actuation
boundary. Requires root (CAP_SYS_NICE for cross-process write).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfeng18 jfeng18 force-pushed the feat/timer-slack-optimize branch from 5dc635e to 7e99112 Compare June 10, 2026 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:sight src/agentsight/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant