Cli rpm#8
Open
yangdao479 wants to merge 600 commits into
Open
Conversation
- Add missing agentsight.service and agentsight-start to tarball in the unified scripts/rpm-build.sh build_agentsight() function - Regenerate dashboard/package-lock.json with public npm registry (replace internal registry.anpm.alibaba-inc.com URLs) Fixes alibaba#817
Detect agent process crashes immediately via ProcMon::Exit eBPF tracepoint instead of waiting for HealthChecker's 30s polling cycle. On exit, drain in-flight HTTP connections for the dead PID, persist them as pending calls, and emit an agent_crash interruption event with OOM attribution from dmesg. - aggregator: add drain_connections_for_pid() to extract pending/SSE connections by PID, used by crash detection - unified: handle_agent_crash_detection() called from ProcMon::Exit; groups pending calls by (session_id, conversation_id) and writes one interruption event per conversation with source="trace_procmon_exit" - interruption store: agent_crash_exists_recent() for 1s dedup window between trace path and serve mode HealthChecker fallback - health/checker: skip writing if trace path already recorded the crash within the dedup window Adds integration-tests/interruption/ with reproducible scenario scripts and a README documenting deployment, agent_crash / agent_crash_oom construction procedures, and 11 lessons learned from real verification on the sysak production deployment. Signed-off-by: liyuqing <liyuqing@alibaba-inc.com>
Replace the nested `anolisa subscription {register,unregister,status}`
sub-commands with top-level `anolisa register` / `anolisa unregister`,
and rename the corresponding CLI module from `subscription.rs` to
`register.rs`. `anolisa register status` reuses the existing dispatch
to avoid colliding with `anolisa status`.
Why: per the design note (anolisa-register-design §4.4), "subscription"
implies a paid/opt-out model, while the actual semantics are pure
consent-based registration for token-collection upload. Flattening the
verb also matches the most common user operations (register/unregister)
without an extra nesting level.
Drop the `InitLater` ("ask me later, expire in 30 days") state branch
together with `do_later()` and its tests. The first iteration only ships
the binary REGISTERED / UNREGISTERED states; the calendar-based 30-day
expiry was never a hard requirement and adds state-machine surface area
that is not needed for the token-collection MVP.
Rename the agentsight enablement marker file
`/etc/anolisa/enable_sls_log` → `/etc/anolisa/enable_token_collector`
so the on-disk artifact reflects what is actually being gated (token
collection upload), and update user-facing error strings and doc
comments from "subscription" to "register".
Known limitations:
- This is a breaking change for any caller still invoking
`anolisa subscription ...`; no compatibility alias is provided.
- The marker file rename is not migrated on upgrade; hosts that
registered before this change will need to re-run `anolisa register`
to recreate the new marker.
- SysOM-managed instance detection (sysak_meta / sysak_agentsight) is
unchanged and still relies on hard-coded service names.
Assisted-by: Qoder:latest
Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com>
Rename anolisa-core/src/subscription.rs → register.rs and update
the module declaration in lib.rs to match. This aligns the source
file name with the public-facing CLI verb (`anolisa register`).
Reword user-facing help descriptions from implementation details
("token collection", "stop token upload") to product-level framing
("Join/Leave the Agentic OS Co-Build Program"). Users should see
the value proposition, not the internal mechanism.
Why: the previous wording leaked internal terminology into the CLI
surface, making the commands sound like telemetry opt-in rather than
a co-build program enrollment. The file rename eliminates a naming
mismatch (module called "subscription" but CLI says "register").
Assisted-by: Qoder:latest
Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com>
- Add side-effect-free framework detection for adapter specs - Expand adapter layout placeholders against active fs layout Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Implement adapter scan and dry-run install planning - Add guarded adapter remove with state and central log updates Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Expand tar_gz source prefixes into concrete file mappings - Validate expanded destinations before copying archive files Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Replace the NOT_IMPLEMENTED stub with a full download → stage → copy → state → log pipeline: 1. Resolve artifact from DistributionIndex via ResolveQuery (prefers tar_gz, falls back to binary). 2. Require sha256 — refuse to install unverified artifacts. 3. Download through DownloadCache (file:// and HTTP(S), with retry). 4. Copy via InstallRunner::install_files using the adapter's source/dest mapping (directory-prefix expansion for tar_gz). 5. Write InstalledObject (ObjectKind::Adapter) + OperationRecord to installed.toml under the install lock. 6. Append central audit log record. 7. On state-save failure after file copy, roll back installed files so no phantom "installed" state remains. Integration tests cover the happy path (install → verify files + state + log), full lifecycle (install → remove), and failure modes (missing sha256, checksum mismatch, dest-already-exists) — each asserting that no state or files leak on error. Closes: alibaba#813 Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Load installed state before copying adapter files under the lock - Reject non-tar_gz adapter artifacts and pass pkg_base to resolution - Clarify best-effort rollback wording on state save failure Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Return runtime error when adapter file deletion fails - Preserve adapter state so failed removals can be retried - Add regression coverage for partial remove failure Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Add SKILL.md that teaches agents how to manage Agentic OS registration state (query/register/unregister). Key design choices: - Intent mapping table for natural language -> CLI command translation - Non-interactive session handling: agent must present the co-build plan explanation and get explicit user consent before using --yes flag - Covers both register and unregister flows with safety confirmations Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com> Assisted-by: Qoder:latest
Implement artifact-centric distribution on the consumer side per: - Add `registry/` submodule (no mod.rs): RegistryConfig parses `[registry]` with ANOLISA_REGISTRY_URL override (config.rs); RegistryClient fetches the distribution `index.toml` / `meta.toml` over HTTP with a TTL cache and offline fallback (client.rs); registry cache layering for index + meta (cache.rs); RegistryError (error.rs). registry.rs is promoted to a parent module that re-exports both the legacy `Registry` catalog facade and the new client. - manifest.rs: add minimal-schema `[component.contract]` (ContractSpec) and `[component.artifact]` (ArtifactSpec) plus component display_name / owner / license / repository, with Raw nested sub-tables taking precedence over the legacy top-level sections. - enable_plan.rs: add ArtifactPlan.meta_sha256, The planner stays IO-free and leaves it None; the CLI fills it after fetch_meta. - enable_execute.rs: before installing, verify the artifact's embedded `.anolisa/component.toml` matches the planned meta sha256 and abort with no files written on mismatch. - CLI: opt-in RegistryClient construction and meta fetch wired into the enable flow (common.rs, tier1/enable.rs). Assisted-by: Claude Code:Opus 4.8 Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Introduce a structured health-check engine and wire it into enable, plus
additive minimal-schema groundwork on the component manifest. Legacy
[install]/[environment] parsing and {etcdir}/{datadir} placeholders are
kept as fallbacks, so existing manifests keep working unchanged.
- health: new CheckSpec/CheckOutcome engine with owned-path, timeout, and
shell-metacharacter guards; binary/file/command probes plus all_of/any_of.
Remaining variants report Unsupported until their slice lands.
- manifest: add FileKind to install files; carry an optional health_check
and synthesize a binary_version probe from the first executable file.
Drop the adapters section (now ignored as unknown keys).
- enable: the plan carries each component's health probe; the executor runs
it after install, records HealthEntry rows on the installed object, and
degrades the component (and capability) to Partial with a warning on hard
failure without rolling back the install.
- layout: accept {sysconfdir}/{sharedir} aliases alongside the legacy names.
- contract_lint: enforce required fields only for manifests that opt into
the minimal schema, so legacy manifests are never newly blocked.
Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
…back Make `enable` resolve the distribution index from the live registry by default instead of being strictly opt-in, and degrade gracefully to the bundled local index when the network is unreachable. - config: point DEFAULT_INDEX_URL at the live public OSS mirror and document it as load-bearing — every enable now hits it unless the `[registry] url` config key or ANOLISA_REGISTRY_URL overrides. The opt-in `load_if_configured` API is retained for a future force-local switch. - common: `registry_client_from` switches to `RegistryConfig::load`, so a client is always constructed (bundled < file < env). Add `ResolvedIndex` and `fetch_remote_index_or_local`: a successful fetch yields a freshness warning, `RegistryError::Offline` degrades to the bundled local index with a warning and `degraded_to_local = true`, and any other RegistryError surfaces as a CliError rather than silently masking a config/parse fault. - enable: the `Some(client)` branch uses `fetch_remote_index_or_local` and skips the per-component meta overlay when degraded to local, since the network is already known unreachable and would only add failed fetch warnings. - index: move the tokenless artifact URL to the per-component subdirectory layout `v1/tokenless/0.5.0/...` (extensible per component/version) and bind sha256/size to the full published artifact. Assisted-by: Claude Code:claude-opus-4-8 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… on main An earlier commit on this branch dropped the `[[adapters]]` section from the component schema (T2.9: adapters were not yet part of the schema). main has since reintroduced adapters as a first-class subsystem — adapter.rs detects frameworks and installs from `manifest::AdapterSpec` — so the schema must carry it again. Re-add AdapterSpec / AdapterRaw, the `adapters` field on ComponentManifest and its parsing, the re-export, and the round-trip test, on top of the health-check additions kept from this branch. Assisted-by: Claude Code:claude-opus-4-8 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve the adapter from the published artifact (not the dev-tree catalog)
and register it into the framework using the framework's own CLI.
- Version-agnostic resolution: pick the highest published semver from the
distribution index (version = None) rather than a version read from the
bundled manifest.
- Take source/dest/version from the artifact's embedded
.anolisa/component.toml via the new
install_runner::read_embedded_component_manifest; the published toml is
authoritative.
- Extract the plugin into anolisa's {datadir} (owned roots), then drive
`openclaw plugins install <dest> --force --dangerously-force-unsafe-install`,
replicating the install.sh argv+env contract (unset OPENCLAW_HOME, set
OPENCLAW_STATE_DIR, prepend PATH) without executing the script. Fail fast
when the framework is absent; roll back extracted files if registration
fails.
- remove symmetrically runs `openclaw plugins uninstall <id> --force`
(--force skips the non-interactive [y/N] prompt) before deleting the owned
datadir copy; state is kept on failure so removal is retryable.
- Tests: hermetic cosh integration fixtures with an embedded manifest
(decoupled from manifests/runtime); pure argv/env unit tests for the
OpenClaw invocation builders.
index.toml: bump tokenless 0.5.0 sha256/size to match the republished
artifact that now carries the [[adapters]] declaration.
Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Summarize anolisa CLI updates since 0.1.3 in the changelog. Assisted-by: Codex:GPT-5
Use a temporary system prefix for the enable dry-run smoke test so registry cache and config lookups stay under a tempdir instead of host system paths. Rename the test and rustdoc-style test description to match the actual contract: the handler renders a dry-run plan envelope, but the plan may still be blocked or degraded depending on host prechecks. Assisted-by: codex:gpt-5 Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
…ctor switch Add a background watcher thread that polls /etc/anolisa/enable_token_collector once per second. When the trigger file exists, read SLS_LOG_PATH from /etc/anolisa/ilogtail.cfg (INI key=value, supports single/double quotes) and write it to runtime.sls_logtail_path of the agentsight config. When the trigger file is removed, clear runtime.sls_logtail_path. The existing config watcher detects the resulting CLOSE_WRITE and activates the SLS LogtailExporter, so this commit only adds the bridging layer without touching SLS activation itself. Implementation notes: - Enable serde_json 'preserve_order' feature so that runtime/deadloop/https/ cmdline field order in agentsight.json stays stable across rewrites. - State machine in the watcher avoids redundant disk writes by caching last_state (None / Some(None) / Some(Some(path))). - write_runtime_sls_path() returns Ok(false) when the value is unchanged (idempotent), so reapplying the same path does not retrigger inotify. - File paths and poll interval are kept as in-function constants to keep the production API surface minimal. Tests: - 13 unit tests in src/unified.rs cover read_logtail_sls_path (basic, single/double quotes, empty value, missing key, comments, file missing), write_runtime_sls_path (set / clear / idempotent / creates runtime section / invalid root errors) and an end-to-end logic simulation. - scripts/int-test-token-collector.sh drives the real agentsight binary through 5 phases (enable, disable→clear, double-quoted value, missing SLS_LOG_PATH, field preservation). Passed 9/9 on a non-ECS host (phase 2 auto-skipped when ECS metadata is unreachable) and 10/10 on a real ECS host where SLS uid validation succeeds.
The token-collector bridge (alibaba#839) only wrote runtime.sls_logtail_path into config.json on enable/disable, but disable did not actually pause SLS uploads: sls_activated was a one-way AtomicBool, the LogtailExporter locked self.path at construction, and the config-watcher treated the empty-string value as a no-signal None. Removing the trigger file thus left SLS uploading until the process restarted. Make activation truly reversible without restart: * config::parse_runtime_sls_path now returns Option<Option<String>>: - None — field absent / parse error - Some(None) — empty string → deactivation signal - Some(Some(path)) — non-empty → (re-)activation signal * genai::logtail::set_dynamic_logtail_path treats an empty string as 'clear' (resets DYNAMIC_LOGTAIL_PATH to None) and logs the pause. * LogtailExporter gains a 'dynamic' bool: instances created via new_with_path read logtail_path() each export() and skip the batch when it is None, so a cleared dynamic path silently pauses uploads; env-var instances keep their locked path (unchanged behavior). * unified::start_config_watcher drops the one-way 'if activated skip' guard and dispatches on the tri-state: empty → swap(false) + clear; non-empty → uid check + set dynamic; first time also creates and posts an exporter to the mailbox; afterwards just swaps the path. Tests: * 5 parse_runtime_sls_path unit tests updated to the tri-state contract. * All 554 lib tests pass: cargo test --lib. * Integration script scripts/int-test-token-collector.sh adds Phase 6 exercising activate → deactivate (pause) → re-activate-with-new-path inside one process lifetime, gated on ECS metadata. End-to-end run on Anolis OS / kernel 5.10.134: 15 passed, 0 failed.
…ent from SLS by default
This is a privacy-safe default-flip on top of the SLS Logtail reversible
activation work. Previously `AgentsightConfig::new()` set
`trace_enabled = true` and the default `agentsight.json` shipped without
the field, so any operator activating SLS upload (via token-collector
trigger or `SLS_LOGTAIL_FILE` env) would automatically upload full
conversation bodies (`gen_ai.input.messages`, `gen_ai.output.messages`,
`gen_ai.system_instructions`) — leaking sensitive prompt/response text
unless they had explicitly written `"traceEnabled": false`.
Flip the default: only token / model / provider / timing metadata leaves
the host on SLS uploads unless the operator explicitly opts in by
writing `"traceEnabled": true`.
Changes:
* `AgentsightConfig::new()` now defaults `trace_enabled = false`.
* Rewrote the doc-comment on `pub trace_enabled` to describe the actual
scope (SLS upload payload only) — it does NOT stop the agent, eBPF
probes, local SQLite persistence or token metering, all of which keep
running. Local SQLite always retains full content; this flag only
shapes what crosses the network to SLS.
* Added 3 unit tests that pin the new contract:
- `test_trace_enabled_default_is_false` — locks the default.
- `test_load_from_json_missing_trace_enabled_keeps_default_false` —
omitted field must NOT flip the default (Option<bool> + serde
default).
- `test_load_from_json_explicit_trace_enabled_true` — explicit opt-in
works.
Compatibility note: this is a breaking change for any deployment that
relies on the implicit default to stream conversation bodies. Operators
who want the previous behavior must add `"traceEnabled": true` to their
config file.
Tests: cargo test --lib → 557 passed (was 554; +3 new locks).
…raceEnabled=false
Tightens the privacy-safe-by-default contract introduced in the previous
commit. Previously, even with `traceEnabled=false`, the SLS upload still
carried `gen_ai.system_instructions` — which usually contains the agent's
system prompt (product business logic, tool descriptions, role
instructions, sometimes embedded credentials). This contradicted the
field's docstring promise that all conversation content fields are
dropped when the flag is off.
Changes:
* `events_to_flat_records`: wrap the system_instructions emission in
`if trace_enabled { ... }` (parallel to the existing input.messages /
output.messages gating).
* Updated the `pub trace_enabled` field doc and the
`events_to_flat_records` function doc to enumerate all three guarded
fields.
* Updated both `new()` and `new_with_path()` activation logs to list all
three field names.
* Strengthened the regression tests:
- `test_trace_enabled_true_includes_messages` now asserts
`gen_ai.system_instructions` IS present.
- `test_trace_enabled_false_drops_messages_keeps_token_metadata` now
asserts it is ABSENT, and removes the prior leak-check exemption
(which was effectively a no-op anyway since the field name does not
end with `.messages`).
Compatibility note: this is a breaking change for any operator who
implicitly relied on system_instructions reaching SLS while keeping
traceEnabled=false. Such operators must now set `"traceEnabled": true`
to opt back in.
E2E verified on production-like ECS (101.37.234.43):
* Activation log emits the new wording listing all three fields.
* After triggering a real LLM call via cosh, the resulting SLS record
contains 32 fields (down from 33 in the previous build); all three of
`gen_ai.system_instructions`, `gen_ai.input.messages`,
`gen_ai.output.messages` are confirmed ABSENT.
Tests: cargo test --lib -> 557 passed.
Refs: alibaba#841
Upgrade the vendored rtk from v0.36.0 to v0.42.3. v0.42.3 includes upstream fixes for grep filename preservation (-H flag, NUL separator), --no-ignore-vcs, and exec_capture execution model. Upgrade toon-format from 0.4.6 to 0.5.0 (released 2026-05-22): - Added: (layout) expose decoder layout metadata behind cargo feature - Fixed: deserialization failure for u64 values larger than i64::MAX - No breaking changes The tokenless stats patch is updated for v0.42.3: - Reduced from 218 to 185 lines - Removed v0.36.0-specific clippy suppressions - Only includes tracking.rs and hook_check.rs changes Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
When rtk grep falls back from rg to grep (when rg is unavailable), the fallback uses the original BRE pattern instead of the PCRE-converted pattern and lacks -E (extended regex) support. This causes patterns with alternation (e.g. 'fn foo\|pub.*bar') to fail silently: - BRE pattern 'fn foo\|pub.*bar' is converted to PCRE 'fn foo|pub.*bar' - When rg is unavailable, grep receives the original BRE pattern without -E flag, so grep interprets it literally (not as alternation) - Result: zero matches returned, misleading AI agents The patch fixes the grep fallback to use the PCRE-converted pattern with -E flag so alternation works correctly in both rg and grep. Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
parse_summary_line did not recognize 'error'/'errors' in pytest output summaries (e.g. '1 error in 0.10s' from collection errors, or '5 passed, 2 errors in 0.50s'), returning all-zero counts. This triggered the misleading 'Pytest: No tests collected' output, causing LLMs to retry with different parameters. The patch adds: - errors field in PytestCounts struct with Debug+PartialEq derives - error detection in ===-wrapped and quiet summary lines - error parsing in parse_summary_line (singular and plural) - error count display in build_pytest_summary output - Include errors in extras_present check to prevent early return - 3 new tests: collection-error-only, mixed-errors, summary-parsing Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
- remove deprecated capability command modules - add install stub and regroup top-level help - drop enable-only execution policy wiring Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Stop retrying after 3 consecutive failures to avoid repeated 1s timeouts in non-ECS environments. Addresses review on alibaba#783.
When a process execves multiple times (e.g., bash exec-ing into sleep), the previous implementation only kept the last exec's args, losing the original user-initiated command. This causes correlation issues for tools like alibaba#1025 that need to match tool_call commands against execve events. Changes: - AggregatedProcess::add_exec now preserves the first exec's args (the complete user command) and appends " ..." to mark subsequent execs occurred - filename is still updated to reflect final exec state (backward compat) - Added 4 unit tests covering single/multiple exec scenarios Example: - Before: python subprocess.run(['bash','-lc','echo X && sleep 1']) → audit args = "sleep 1" (last exec only) - After: audit args = "bash -lc echo X && sleep 1 ..." (first + marker) The " ..." marker indicates exec chain truncation and leaves room for future enhancement (full exec chain tracking, deferred until needed). Context: alibaba#1025 Phase 0 spike discovered this issue during tool_call↔execve correlation research. ECS testing confirmed the fix preserves complete args for Bash tool_call matching. Related: alibaba#1025
Add ParsedApiMessage::request_metadata_session_id() and shared session_id_from_metadata() helper to read session info from Anthropic metadata. Use as highest-priority source for session_id in both call_builder (normal) and builder (pending/crash) paths. conversation_id remains hash-based (unchanged). part of alibaba#1014
decide_sls_config_change called set_dynamic_logtail_path, a process-global
side-effect, contradicting its own enum doc ("side effects are carried out by
the thread shell so the decision logic stays pure"). Move the dynamic-path
update into handle_config_event's match arms (Deactivated/Activate/Reactivated);
decide now only performs the caller-owned sls_activated test-and-set.
Behavior-preserving: every SlsConfigAction variant leaves the global
DYNAMIC_LOGTAIL_PATH in the same final state, and sls_activated has no
production reader correlating it with the path, so the swap-vs-set reorder is
unobservable.
Tests: add a discriminating test that decide leaves the global path untouched
while handle sets it on activation and clears it on deactivation, plus assert
the overwrite on reactivation; all three handler set-sites and decide-purity are
mutation-covered. A mutex serializes the global-path tests.
Gracefully handle None/Err in unified.rs non-test code with log::warn instead of panicking. Follow-up from alibaba#937.
- Update SkillFS core, CLI, and FUSE code to the POSIX baseline. - Keep ANOLISA workspace metadata while importing checkpoint support. Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Add focused FUSE integration tests for the POSIX baseline. - Add the pjdfstest wrapper and manifests for external validation. Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Document POSIX passthrough scope and external harness usage. - Record mandatory fmt and clippy checks for SkillFS changes. Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Mirror the test-agentsight CI gates locally so PRs pass CI on the first attempt instead of cycling through failures (the motivating case, alibaba#973, failed 3 rounds: coverage, commit-lint, then fmt). Part 1: enhance the agentsight-pr-body skill preflight (Step 1.6) with the incremental-coverage gate and conventional-commit lint, alongside fmt/clippy. Part 2: an opt-in, agent-agnostic git pre-push hook (make install-hooks) that mirrors the CI hard gates; no-ops unless the branch touches src/agentsight/, and gates coverage only under PREPUSH_COVERAGE=1. Closes alibaba#974.
Prevent repair and user-layout tests from writing state through the process HOME or XDG roots. Guard OpenClaw-related environment mutations and avoid changing PATH in adapter manager tests. Assisted-by: OpenAI Codex:gpt-5 Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Harden the compressed-SSE decode path against three issues: 1. Decompression bomb: the decoders (gzip/deflate/zstd/brotli) had no output cap, so a crafted bomb from an observed, untrusted process could OOM the single privileged observer. Add MAX_DECOMPRESSED_LEN (32 MiB) via Read::take with a raw fallback; zstd moves off decode_all to a streaming Decoder. Also cap the in-flight compressed buffer (8 MiB) against a never-terminating stream. 2. Premature completion: scanning the compressed buffer for the chunk terminator could match by chance inside a compressed payload and finish early, truncating the body so decompression fails and the call is dropped. Detect completion via chunk framing (chunked_stream_complete) instead. 3. Drain data loss: drain_and_persist_dead_connections dropped compressed_buffer, so a compressed stream that died before completing (e.g. HTTP/2) lost its whole body. Decode it on drain via the shared decode_compressed_sse / drained_sse_events. Tests: discriminating unit tests for the output cap, embedded-terminator resistance, the compressed-buffer cap, the shared decoder, and the drain decode decision.
- Surface skip-bootstrap guidance after install completes - Document identity files can be adjusted later Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Extract L1 atomic facts from session audit logs via heuristic rules when an MCP session ends (SIGTERM/ctrl_c). Zero LLM calls, pure pattern matching on tool-call sequences. Rules: - Working context: same-directory write patterns - Interest: search query extraction - Change: repeated edit / edit-then-read verification - Lesson: error pattern classification - Promoted: promote events as importance signals - Summary: session activity statistics Storage: - facts/<ulid>.md: markdown with YAML frontmatter - facts/facts.jsonl: structured index for search - mem_consolidate MCP tool for manual trigger - ConsolidationConfig with env overrides Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
- daemon writes per-operation JSONL to /var/log/anolisa/sls/ops/ws-ckpt.jsonl
with fields: component.name/version/agent_name, ops_id, ops_name,
ckpt_time/roll_time/diff_time/list_time/ops_time, err_reason, supply
- detect caller identity via /proc/{peer_pid}/environ WS_CKPT_AGENT_NAME,
whitelist to known agents (user/hermes/openclaw); env unset falls back
to "user" (direct CLI), unknown values fall back to "unknown"
- ops_id uses timestamp_ms-pid-seq (AtomicU64)
- gate seccompiler behind cfg(target_os = "linux") so daemon crate compiles
on non-Linux targets
- hermes/openclaw plugins set WS_CKPT_AGENT_NAME env when spawning CLI
Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
Add local JSONL-based session telemetry for SLS agent collection. At session end, logSessionSummary writes a comprehensive record to /var/log/anolisa/sls/ops/cosh.jsonl including: - Component identification (name, version, agent_name) - Session config (model, auth_type, approval_mode) - Audit decision counts (approve/deny/modify) - Tool call counts (total/success/fail) with duration - Tool error classification (model_error/execution_error/denied) - File operation stats (lines added/removed) - Sandbox stats (runs/blocked) - Token usage (input/output/cached/total) - API stats (requests/errors/latency) - Environment info (os.type, os.arch)
- add hermes plugin tests: config, checkpoint_manager, tools, __init__ (161 tests, 91% coverage) - add openclaw plugin tests: btrfs-manager, commands, config, handlers, environment-check, snapshot-store, state, whitelist (200 tests, 98% coverage) - add rust unit tests for migration, lockfile, fs_watcher, state (17 tests covering previously-untested modules) - exclude tests/ and .coverage from RPM in ws-ckpt.spec.in - add coverage artifacts to .gitignore - fix package-lock.json to use public npm registry Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
- add cargo-tarpaulin rust coverage gate (>=45%) to test-ws-ckpt job - add openclaw vitest coverage gate (>=90%) with vitest.config.ts - add hermes pytest-cov coverage gate (>=90%) - add btrfs loop e2e integration test exercising full CLI flow Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
Move the /var/lib/ws-ckpt backup before registering the cleanup trap so early failures cannot remove a real ws-ckpt state directory. Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
- Resolve adapter resources from component contract dest for adopt flows - Add structured skill sources with scoped datadir validation - Keep convention discovery as fallback when no dest is declared Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Use Hermes plain plugin list output for status checks - Parse OpenClaw rich tables with ANSI stripping and wrapped cells - Cover false-negative and false-positive plugin detection cases Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related Issue
closes #
Type of Change
Scope
cosh(copilot-shell)sec-core(agent-sec-core)skill(os-skills)sight(agentsight)tokenless(tokenless)Checklist
cosh: Lint passes, type check passes, and tests passsec-core(Rust):cargo clippy -- -D warningsandcargo fmt --checkpasssec-core(Python): Ruff format and pytest passskill: Skill directory structure is valid and shell scripts pass syntax checksight:cargo clippy -- -D warningsandcargo fmt --checkpasstokenless:cargo clippy -- -D warningsandcargo fmt --checkpasspackage-lock.json/Cargo.lock)Testing
Additional Notes