Problem
PilotSwarm's only durable-timer primitive is cron(seconds, reason) — pure interval-based. Agents that need to fire at a specific wall-clock time (e.g., a nightly compliance auditor at 02:00 UTC, a weekly customer SLA report at Mon 09:00 customer-TZ, a monthly billing reconciliation at the 1st 04:00) must work around the missing primitive with "wake every N minutes, check if it's HH:MM, sleep again."
The durable-timers SKILL.md teaches the interval pattern as the only recurring primitive, with no example of computing variable-interval cron for wall-clock targets. An agent that follows the skill literally for "every day at 02:00 UTC" picks the maximally-literal interpretation: wake every ~15 min, time-check, sleep. That costs ~96 LLM turns per day for a job that fires once.
Concrete production cost shape
A nightly compliance/audit agent that runs once at 02:00 UTC daily:
| Pattern |
Wakes/day |
LLM turns/day |
Annualized |
Naive cron(900) + time-check guard |
~96 |
~96 (each wake = full turn for fact-read + clock-check + sleep) |
~35,000 turns/year for ONE daily run |
Wall-clock anchored cron_at |
1 |
1 |
365 turns/year |
Per-tenant agents in a fleet multiply this directly. Token cost is dominated by skill + system prompt being re-served on every wake.
Proposed fix
Add a sibling tool cron_at that accepts wall-clock anchor fields. The SDK orchestration computes the next-fire-ms (with TZ + DST handling) under the hood and uses the existing durable-timer machinery.
API
```ts
cron_at({
minute: 0-59, // required — the wall-clock anchor
hour?: 0-23, // omit ⇒ hourly recurrence
day_of_week?: 0-6, // 0 = Sunday — weekly (mutually exclusive with day_of_month)
day_of_month?: 1-31, // monthly (mutually exclusive with day_of_week)
tz: string, // IANA zone (e.g. "America/Los_Angeles") — mandatory
max_fires?: number, // optional cap on total fires; omit ⇒ fire forever
reason: string,
})
```
Recurrence inferred from anchor fields
| Set fields |
Recurrence |
Example |
minute |
hourly |
{minute: 5} → every hour at HH:05 (anomaly detection sweeps) |
minute + hour |
daily |
{minute: 0, hour: 2, tz: "UTC"} → 02:00 UTC nightly compliance audit |
minute + hour + day_of_week |
weekly |
{minute: 0, hour: 9, day_of_week: 1, tz: "America/New_York"} → Mondays 09:00 ET SLA report |
minute + hour + day_of_month |
monthly |
{minute: 0, hour: 4, day_of_month: 1, tz: "UTC"} → 1st of month 04:00 UTC billing reconciliation |
max_fires semantics
| Value |
Meaning |
| omitted (default) |
recurrent forever |
1 |
one-shot scheduled action — fires at the next anchor time, then stops. Wall-clock-anchored counterpart to wait(seconds) |
n > 1 |
fires n times at consecutive anchor instants, then stops |
0 or negative |
reject at validation |
After the last fire, the SDK stops scheduling new wakes and emits a cron.completed event so the agent can finalize.
Edge cases (locked at design time)
- DOM=31 in short months: standard cron behavior — skip that month. ("Last day of month" is a future feature, not v1.)
day_of_week + day_of_month both set: reject at validation.
day_of_week or day_of_month set without hour: reject (period inference ambiguous).
tz mandatory: no silent UTC default. Forces explicit choice for cross-tenant agents.
- DST: SDK recomputes next-fire on every wake from current UTC + IANA zone — handles spring-forward / fall-back without agent involvement.
- Replay safety:
max_fires counter is deterministic per orchestration replay — stored in orchestration state, decremented atomically with each fire.
Existing cron(seconds, reason) stays untouched
Static-interval workloads (sweeper, resourcemgr) keep their current API. Only wall-clock-anchored use cases use cron_at.
Skill update
packages/sdk/plugins/system/skills/durable-timers/SKILL.md adds a "Wall-Clock Anchored Recurring" pattern teaching agents to use cron_at and explicitly forbidding the wake-and-check anti-pattern. Plus a "One-Shot Scheduled at Wall-Clock" sub-pattern showing max_fires: 1.
Why field-named over cron-expression syntax
LLMs fumble cron-expression positional grammar (0 9 * * *). Named fields with Zod-validated ranges are easier to generate correctly and easier to read in skill docs.
Out of scope (for follow-ups)
- Full cron-expression support.
- Per-day distinct times (e.g., 09:00 weekdays + 11:00 weekends).
- "Last day of month" / "last weekday" semantics.
start_at / end_at absolute-time bounds (current max_fires covers most "stop after N" cases).
Tracking
Will land as a separate PR against main. Skill update is part of the same PR. Estimated diff: ~250 lines (orchestration handler + tool registration + Zod schema + skill update + tests).
Problem
PilotSwarm's only durable-timer primitive is
cron(seconds, reason)— pure interval-based. Agents that need to fire at a specific wall-clock time (e.g., a nightly compliance auditor at 02:00 UTC, a weekly customer SLA report at Mon 09:00 customer-TZ, a monthly billing reconciliation at the 1st 04:00) must work around the missing primitive with "wake every N minutes, check if it's HH:MM, sleep again."The
durable-timersSKILL.md teaches the interval pattern as the only recurring primitive, with no example of computing variable-interval cron for wall-clock targets. An agent that follows the skill literally for "every day at 02:00 UTC" picks the maximally-literal interpretation: wake every ~15 min, time-check, sleep. That costs ~96 LLM turns per day for a job that fires once.Concrete production cost shape
A nightly compliance/audit agent that runs once at 02:00 UTC daily:
cron(900)+ time-check guardcron_atPer-tenant agents in a fleet multiply this directly. Token cost is dominated by skill + system prompt being re-served on every wake.
Proposed fix
Add a sibling tool
cron_atthat accepts wall-clock anchor fields. The SDK orchestration computes the next-fire-ms (with TZ + DST handling) under the hood and uses the existing durable-timer machinery.API
```ts
cron_at({
minute: 0-59, // required — the wall-clock anchor
hour?: 0-23, // omit ⇒ hourly recurrence
day_of_week?: 0-6, // 0 = Sunday — weekly (mutually exclusive with day_of_month)
day_of_month?: 1-31, // monthly (mutually exclusive with day_of_week)
tz: string, // IANA zone (e.g. "America/Los_Angeles") — mandatory
max_fires?: number, // optional cap on total fires; omit ⇒ fire forever
reason: string,
})
```
Recurrence inferred from anchor fields
minute{minute: 5}→ every hour at HH:05 (anomaly detection sweeps)minute + hour{minute: 0, hour: 2, tz: "UTC"}→ 02:00 UTC nightly compliance auditminute + hour + day_of_week{minute: 0, hour: 9, day_of_week: 1, tz: "America/New_York"}→ Mondays 09:00 ET SLA reportminute + hour + day_of_month{minute: 0, hour: 4, day_of_month: 1, tz: "UTC"}→ 1st of month 04:00 UTC billing reconciliationmax_firessemantics1wait(seconds)n > 1ntimes at consecutive anchor instants, then stops0or negativeAfter the last fire, the SDK stops scheduling new wakes and emits a
cron.completedevent so the agent can finalize.Edge cases (locked at design time)
day_of_week+day_of_monthboth set: reject at validation.day_of_weekorday_of_monthset withouthour: reject (period inference ambiguous).tzmandatory: no silent UTC default. Forces explicit choice for cross-tenant agents.max_firescounter is deterministic per orchestration replay — stored in orchestration state, decremented atomically with each fire.Existing
cron(seconds, reason)stays untouchedStatic-interval workloads (sweeper, resourcemgr) keep their current API. Only wall-clock-anchored use cases use
cron_at.Skill update
packages/sdk/plugins/system/skills/durable-timers/SKILL.mdadds a "Wall-Clock Anchored Recurring" pattern teaching agents to usecron_atand explicitly forbidding the wake-and-check anti-pattern. Plus a "One-Shot Scheduled at Wall-Clock" sub-pattern showingmax_fires: 1.Why field-named over cron-expression syntax
LLMs fumble cron-expression positional grammar (
0 9 * * *). Named fields with Zod-validated ranges are easier to generate correctly and easier to read in skill docs.Out of scope (for follow-ups)
start_at/end_atabsolute-time bounds (currentmax_firescovers most "stop after N" cases).Tracking
Will land as a separate PR against
main. Skill update is part of the same PR. Estimated diff: ~250 lines (orchestration handler + tool registration + Zod schema + skill update + tests).