Skip to content

Cli rpm#8

Open
yangdao479 wants to merge 600 commits into
mainfrom
cli_rpm
Open

Cli rpm#8
yangdao479 wants to merge 600 commits into
mainfrom
cli_rpm

Conversation

@yangdao479

Copy link
Copy Markdown
Owner

Description

Related Issue

closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional change)
  • Performance improvement
  • CI/CD or build changes

Scope

  • cosh (copilot-shell)
  • sec-core (agent-sec-core)
  • skill (os-skills)
  • sight (agentsight)
  • tokenless (tokenless)
  • Multiple / Project-wide

Checklist

  • I have read the Contributing Guide
  • My code follows the project's code style
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the documentation accordingly
  • For cosh: Lint passes, type check passes, and tests pass
  • For sec-core (Rust): cargo clippy -- -D warnings and cargo fmt --check pass
  • For sec-core (Python): Ruff format and pytest pass
  • For skill: Skill directory structure is valid and shell scripts pass syntax check
  • For sight: cargo clippy -- -D warnings and cargo fmt --check pass
  • For tokenless: cargo clippy -- -D warnings and cargo fmt --check pass
  • Lock files are up to date (package-lock.json / Cargo.lock)

Testing

Additional Notes

RemindD and others added 30 commits June 10, 2026 18:53
- Add missing agentsight.service and agentsight-start to tarball in
  the unified scripts/rpm-build.sh build_agentsight() function
- Regenerate dashboard/package-lock.json with public npm registry
  (replace internal registry.anpm.alibaba-inc.com URLs)

Fixes alibaba#817
Detect agent process crashes immediately via ProcMon::Exit eBPF tracepoint
instead of waiting for HealthChecker's 30s polling cycle. On exit, drain
in-flight HTTP connections for the dead PID, persist them as pending calls,
and emit an agent_crash interruption event with OOM attribution from dmesg.

- aggregator: add drain_connections_for_pid() to extract pending/SSE
  connections by PID, used by crash detection
- unified: handle_agent_crash_detection() called from ProcMon::Exit;
  groups pending calls by (session_id, conversation_id) and writes one
  interruption event per conversation with source="trace_procmon_exit"
- interruption store: agent_crash_exists_recent() for 1s dedup window
  between trace path and serve mode HealthChecker fallback
- health/checker: skip writing if trace path already recorded the crash
  within the dedup window

Adds integration-tests/interruption/ with reproducible scenario scripts
and a README documenting deployment, agent_crash / agent_crash_oom
construction procedures, and 11 lessons learned from real verification
on the sysak production deployment.

Signed-off-by: liyuqing <liyuqing@alibaba-inc.com>
Replace the nested `anolisa subscription {register,unregister,status}`
sub-commands with top-level `anolisa register` / `anolisa unregister`,
and rename the corresponding CLI module from `subscription.rs` to
`register.rs`. `anolisa register status` reuses the existing dispatch
to avoid colliding with `anolisa status`.

Why: per the design note (anolisa-register-design §4.4), "subscription"
implies a paid/opt-out model, while the actual semantics are pure
consent-based registration for token-collection upload. Flattening the
verb also matches the most common user operations (register/unregister)
without an extra nesting level.

Drop the `InitLater` ("ask me later, expire in 30 days") state branch
together with `do_later()` and its tests. The first iteration only ships
the binary REGISTERED / UNREGISTERED states; the calendar-based 30-day
expiry was never a hard requirement and adds state-machine surface area
that is not needed for the token-collection MVP.

Rename the agentsight enablement marker file
`/etc/anolisa/enable_sls_log` → `/etc/anolisa/enable_token_collector`
so the on-disk artifact reflects what is actually being gated (token
collection upload), and update user-facing error strings and doc
comments from "subscription" to "register".

Known limitations:
- This is a breaking change for any caller still invoking
  `anolisa subscription ...`; no compatibility alias is provided.
- The marker file rename is not migrated on upgrade; hosts that
  registered before this change will need to re-run `anolisa register`
  to recreate the new marker.
- SysOM-managed instance detection (sysak_meta / sysak_agentsight) is
  unchanged and still relies on hard-coded service names.

Assisted-by: Qoder:latest

Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com>
Rename anolisa-core/src/subscription.rs → register.rs and update
the module declaration in lib.rs to match. This aligns the source
file name with the public-facing CLI verb (`anolisa register`).

Reword user-facing help descriptions from implementation details
("token collection", "stop token upload") to product-level framing
("Join/Leave the Agentic OS Co-Build Program"). Users should see
the value proposition, not the internal mechanism.

Why: the previous wording leaked internal terminology into the CLI
surface, making the commands sound like telemetry opt-in rather than
a co-build program enrollment. The file rename eliminates a naming
mismatch (module called "subscription" but CLI says "register").

Assisted-by: Qoder:latest

Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com>
- Add side-effect-free framework detection for adapter specs

- Expand adapter layout placeholders against active fs layout

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Implement adapter scan and dry-run install planning

- Add guarded adapter remove with state and central log updates

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Expand tar_gz source prefixes into concrete file mappings

- Validate expanded destinations before copying archive files

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Replace the NOT_IMPLEMENTED stub with a full download → stage → copy →
state → log pipeline:

1. Resolve artifact from DistributionIndex via ResolveQuery (prefers
   tar_gz, falls back to binary).
2. Require sha256 — refuse to install unverified artifacts.
3. Download through DownloadCache (file:// and HTTP(S), with retry).
4. Copy via InstallRunner::install_files using the adapter's
   source/dest mapping (directory-prefix expansion for tar_gz).
5. Write InstalledObject (ObjectKind::Adapter) + OperationRecord to
   installed.toml under the install lock.
6. Append central audit log record.
7. On state-save failure after file copy, roll back installed files so
   no phantom "installed" state remains.

Integration tests cover the happy path (install → verify files + state +
log), full lifecycle (install → remove), and failure modes (missing
sha256, checksum mismatch, dest-already-exists) — each asserting that
no state or files leak on error.

Closes: alibaba#813
Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Load installed state before copying adapter files under the lock

- Reject non-tar_gz adapter artifacts and pass pkg_base to resolution

- Clarify best-effort rollback wording on state save failure

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Return runtime error when adapter file deletion fails

- Preserve adapter state so failed removals can be retried

- Add regression coverage for partial remove failure

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Add SKILL.md that teaches agents how to manage Agentic OS registration
state (query/register/unregister).

Key design choices:
- Intent mapping table for natural language -> CLI command translation
- Non-interactive session handling: agent must present the co-build plan
  explanation and get explicit user consent before using --yes flag
- Covers both register and unregister flows with safety confirmations

Signed-off-by: Kailong Zhou <zhoukailong.zkl@alibaba-inc.com>
Assisted-by: Qoder:latest
Implement artifact-centric distribution on the consumer side per:

- Add `registry/` submodule (no mod.rs): RegistryConfig parses `[registry]`
  with ANOLISA_REGISTRY_URL override (config.rs); RegistryClient fetches the
  distribution `index.toml` / `meta.toml` over HTTP with a TTL cache and
  offline fallback (client.rs); registry cache layering for index + meta
  (cache.rs); RegistryError (error.rs). registry.rs is promoted to a parent
  module that re-exports both the legacy `Registry` catalog facade and the
  new client.
- manifest.rs: add minimal-schema `[component.contract]` (ContractSpec) and
  `[component.artifact]` (ArtifactSpec) plus component display_name / owner /
  license / repository, with Raw nested sub-tables taking precedence over the
  legacy top-level sections.
- enable_plan.rs: add ArtifactPlan.meta_sha256, The planner stays IO-free and
  leaves it None; the CLI fills it after fetch_meta.
- enable_execute.rs: before installing, verify the artifact's embedded
  `.anolisa/component.toml` matches the planned meta sha256 and abort with no
  files written on mismatch.
- CLI: opt-in RegistryClient construction and meta fetch wired into the enable
  flow (common.rs, tier1/enable.rs).

Assisted-by: Claude Code:Opus 4.8
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Introduce a structured health-check engine and wire it into enable, plus
additive minimal-schema groundwork on the component manifest. Legacy
[install]/[environment] parsing and {etcdir}/{datadir} placeholders are
kept as fallbacks, so existing manifests keep working unchanged.

- health: new CheckSpec/CheckOutcome engine with owned-path, timeout, and
  shell-metacharacter guards; binary/file/command probes plus all_of/any_of.
  Remaining variants report Unsupported until their slice lands.
- manifest: add FileKind to install files; carry an optional health_check
  and synthesize a binary_version probe from the first executable file.
  Drop the adapters section (now ignored as unknown keys).
- enable: the plan carries each component's health probe; the executor runs
  it after install, records HealthEntry rows on the installed object, and
  degrades the component (and capability) to Partial with a warning on hard
  failure without rolling back the install.
- layout: accept {sysconfdir}/{sharedir} aliases alongside the legacy names.
- contract_lint: enforce required fields only for manifests that opt into
  the minimal schema, so legacy manifests are never newly blocked.

Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
…back

Make `enable` resolve the distribution index from the live registry by
default instead of being strictly opt-in, and degrade gracefully to the
bundled local index when the network is unreachable.

- config: point DEFAULT_INDEX_URL at the live public OSS mirror and
  document it as load-bearing — every enable now hits it unless the
  `[registry] url` config key or ANOLISA_REGISTRY_URL overrides. The
  opt-in `load_if_configured` API is retained for a future force-local
  switch.
- common: `registry_client_from` switches to `RegistryConfig::load`, so a
  client is always constructed (bundled < file < env). Add `ResolvedIndex`
  and `fetch_remote_index_or_local`: a successful fetch yields a freshness
  warning, `RegistryError::Offline` degrades to the bundled local index
  with a warning and `degraded_to_local = true`, and any other
  RegistryError surfaces as a CliError rather than silently masking a
  config/parse fault.
- enable: the `Some(client)` branch uses `fetch_remote_index_or_local` and
  skips the per-component meta overlay when degraded to local, since
  the network is already known unreachable and would only add failed
  fetch warnings.
- index: move the tokenless artifact URL to the per-component subdirectory
  layout `v1/tokenless/0.5.0/...` (extensible per component/version) and
  bind sha256/size to the full published artifact.

Assisted-by: Claude Code:claude-opus-4-8
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… on main

An earlier commit on this branch dropped the `[[adapters]]` section from the
component schema (T2.9: adapters were not yet part of the schema). main has
since reintroduced adapters as a first-class subsystem — adapter.rs detects
frameworks and installs from `manifest::AdapterSpec` — so the schema must
carry it again. Re-add AdapterSpec / AdapterRaw, the `adapters` field on
ComponentManifest and its parsing, the re-export, and the round-trip test,
on top of the health-check additions kept from this branch.

Assisted-by: Claude Code:claude-opus-4-8
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve the adapter from the published artifact (not the dev-tree catalog)
and register it into the framework using the framework's own CLI.

- Version-agnostic resolution: pick the highest published semver from the
  distribution index (version = None) rather than a version read from the
  bundled manifest.
- Take source/dest/version from the artifact's embedded
  .anolisa/component.toml via the new
  install_runner::read_embedded_component_manifest; the published toml is
  authoritative.
- Extract the plugin into anolisa's {datadir} (owned roots), then drive
  `openclaw plugins install <dest> --force --dangerously-force-unsafe-install`,
  replicating the install.sh argv+env contract (unset OPENCLAW_HOME, set
  OPENCLAW_STATE_DIR, prepend PATH) without executing the script. Fail fast
  when the framework is absent; roll back extracted files if registration
  fails.
- remove symmetrically runs `openclaw plugins uninstall <id> --force`
  (--force skips the non-interactive [y/N] prompt) before deleting the owned
  datadir copy; state is kept on failure so removal is retryable.
- Tests: hermetic cosh integration fixtures with an embedded manifest
  (decoupled from manifests/runtime); pure argv/env unit tests for the
  OpenClaw invocation builders.

index.toml: bump tokenless 0.5.0 sha256/size to match the republished
artifact that now carries the [[adapters]] declaration.

Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Summarize anolisa CLI updates since 0.1.3 in the changelog.

Assisted-by: Codex:GPT-5
Use a temporary system prefix for the enable dry-run smoke test so
registry cache and config lookups stay under a tempdir instead of host
system paths.

Rename the test and rustdoc-style test description to match the actual
contract: the handler renders a dry-run plan envelope, but the plan may
still be blocked or degraded depending on host prechecks.

Assisted-by: codex:gpt-5
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
…ctor switch

Add a background watcher thread that polls /etc/anolisa/enable_token_collector
once per second. When the trigger file exists, read SLS_LOG_PATH from
/etc/anolisa/ilogtail.cfg (INI key=value, supports single/double quotes) and
write it to runtime.sls_logtail_path of the agentsight config. When the
trigger file is removed, clear runtime.sls_logtail_path. The existing config
watcher detects the resulting CLOSE_WRITE and activates the SLS LogtailExporter,
so this commit only adds the bridging layer without touching SLS activation
itself.

Implementation notes:
- Enable serde_json 'preserve_order' feature so that runtime/deadloop/https/
  cmdline field order in agentsight.json stays stable across rewrites.
- State machine in the watcher avoids redundant disk writes by caching
  last_state (None / Some(None) / Some(Some(path))).
- write_runtime_sls_path() returns Ok(false) when the value is unchanged
  (idempotent), so reapplying the same path does not retrigger inotify.
- File paths and poll interval are kept as in-function constants to keep the
  production API surface minimal.

Tests:
- 13 unit tests in src/unified.rs cover read_logtail_sls_path (basic,
  single/double quotes, empty value, missing key, comments, file missing),
  write_runtime_sls_path (set / clear / idempotent / creates runtime section /
  invalid root errors) and an end-to-end logic simulation.
- scripts/int-test-token-collector.sh drives the real agentsight binary
  through 5 phases (enable, disable→clear, double-quoted value, missing
  SLS_LOG_PATH, field preservation). Passed 9/9 on a non-ECS host (phase 2
  auto-skipped when ECS metadata is unreachable) and 10/10 on a real ECS
  host where SLS uid validation succeeds.
The token-collector bridge (alibaba#839) only wrote runtime.sls_logtail_path
into config.json on enable/disable, but disable did not actually pause
SLS uploads: sls_activated was a one-way AtomicBool, the LogtailExporter
locked self.path at construction, and the config-watcher treated the
empty-string value as a no-signal None. Removing the trigger file thus
left SLS uploading until the process restarted.

Make activation truly reversible without restart:

* config::parse_runtime_sls_path now returns Option<Option<String>>:
  - None             — field absent / parse error
  - Some(None)       — empty string  → deactivation signal
  - Some(Some(path)) — non-empty     → (re-)activation signal
* genai::logtail::set_dynamic_logtail_path treats an empty string as
  'clear' (resets DYNAMIC_LOGTAIL_PATH to None) and logs the pause.
* LogtailExporter gains a 'dynamic' bool: instances created via
  new_with_path read logtail_path() each export() and skip the batch
  when it is None, so a cleared dynamic path silently pauses uploads;
  env-var instances keep their locked path (unchanged behavior).
* unified::start_config_watcher drops the one-way 'if activated skip'
  guard and dispatches on the tri-state: empty → swap(false) + clear;
  non-empty → uid check + set dynamic; first time also creates and
  posts an exporter to the mailbox; afterwards just swaps the path.

Tests:

* 5 parse_runtime_sls_path unit tests updated to the tri-state contract.
* All 554 lib tests pass: cargo test --lib.
* Integration script scripts/int-test-token-collector.sh adds Phase 6
  exercising activate → deactivate (pause) → re-activate-with-new-path
  inside one process lifetime, gated on ECS metadata. End-to-end run on
  Anolis OS / kernel 5.10.134: 15 passed, 0 failed.
…ent from SLS by default

This is a privacy-safe default-flip on top of the SLS Logtail reversible
activation work. Previously `AgentsightConfig::new()` set
`trace_enabled = true` and the default `agentsight.json` shipped without
the field, so any operator activating SLS upload (via token-collector
trigger or `SLS_LOGTAIL_FILE` env) would automatically upload full
conversation bodies (`gen_ai.input.messages`, `gen_ai.output.messages`,
`gen_ai.system_instructions`) — leaking sensitive prompt/response text
unless they had explicitly written `"traceEnabled": false`.

Flip the default: only token / model / provider / timing metadata leaves
the host on SLS uploads unless the operator explicitly opts in by
writing `"traceEnabled": true`.

Changes:

* `AgentsightConfig::new()` now defaults `trace_enabled = false`.
* Rewrote the doc-comment on `pub trace_enabled` to describe the actual
  scope (SLS upload payload only) — it does NOT stop the agent, eBPF
  probes, local SQLite persistence or token metering, all of which keep
  running. Local SQLite always retains full content; this flag only
  shapes what crosses the network to SLS.
* Added 3 unit tests that pin the new contract:
  - `test_trace_enabled_default_is_false` — locks the default.
  - `test_load_from_json_missing_trace_enabled_keeps_default_false` —
    omitted field must NOT flip the default (Option<bool> + serde
    default).
  - `test_load_from_json_explicit_trace_enabled_true` — explicit opt-in
    works.

Compatibility note: this is a breaking change for any deployment that
relies on the implicit default to stream conversation bodies. Operators
who want the previous behavior must add `"traceEnabled": true` to their
config file.

Tests: cargo test --lib → 557 passed (was 554; +3 new locks).
…raceEnabled=false

Tightens the privacy-safe-by-default contract introduced in the previous
commit. Previously, even with `traceEnabled=false`, the SLS upload still
carried `gen_ai.system_instructions` — which usually contains the agent's
system prompt (product business logic, tool descriptions, role
instructions, sometimes embedded credentials). This contradicted the
field's docstring promise that all conversation content fields are
dropped when the flag is off.

Changes:

* `events_to_flat_records`: wrap the system_instructions emission in
  `if trace_enabled { ... }` (parallel to the existing input.messages /
  output.messages gating).
* Updated the `pub trace_enabled` field doc and the
  `events_to_flat_records` function doc to enumerate all three guarded
  fields.
* Updated both `new()` and `new_with_path()` activation logs to list all
  three field names.
* Strengthened the regression tests:
  - `test_trace_enabled_true_includes_messages` now asserts
    `gen_ai.system_instructions` IS present.
  - `test_trace_enabled_false_drops_messages_keeps_token_metadata` now
    asserts it is ABSENT, and removes the prior leak-check exemption
    (which was effectively a no-op anyway since the field name does not
    end with `.messages`).

Compatibility note: this is a breaking change for any operator who
implicitly relied on system_instructions reaching SLS while keeping
traceEnabled=false. Such operators must now set `"traceEnabled": true`
to opt back in.

E2E verified on production-like ECS (101.37.234.43):
* Activation log emits the new wording listing all three fields.
* After triggering a real LLM call via cosh, the resulting SLS record
  contains 32 fields (down from 33 in the previous build); all three of
  `gen_ai.system_instructions`, `gen_ai.input.messages`,
  `gen_ai.output.messages` are confirmed ABSENT.

Tests: cargo test --lib -> 557 passed.

Refs: alibaba#841
Upgrade the vendored rtk from v0.36.0 to v0.42.3. v0.42.3 includes
upstream fixes for grep filename preservation (-H flag, NUL separator),
--no-ignore-vcs, and exec_capture execution model.

Upgrade toon-format from 0.4.6 to 0.5.0 (released 2026-05-22):
- Added: (layout) expose decoder layout metadata behind cargo feature
- Fixed: deserialization failure for u64 values larger than i64::MAX
- No breaking changes

The tokenless stats patch is updated for v0.42.3:
- Reduced from 218 to 185 lines
- Removed v0.36.0-specific clippy suppressions
- Only includes tracking.rs and hook_check.rs changes

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
When rtk grep falls back from rg to grep (when rg is unavailable), the
fallback uses the original BRE pattern instead of the PCRE-converted
pattern and lacks -E (extended regex) support. This causes patterns with
alternation (e.g. 'fn foo\|pub.*bar') to fail silently:

- BRE pattern 'fn foo\|pub.*bar' is converted to PCRE 'fn foo|pub.*bar'
- When rg is unavailable, grep receives the original BRE pattern without
  -E flag, so grep interprets it literally (not as alternation)
- Result: zero matches returned, misleading AI agents

The patch fixes the grep fallback to use the PCRE-converted pattern with
-E flag so alternation works correctly in both rg and grep.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
parse_summary_line did not recognize 'error'/'errors' in pytest output
summaries (e.g. '1 error in 0.10s' from collection errors, or '5 passed,
2 errors in 0.50s'), returning all-zero counts. This triggered the
misleading 'Pytest: No tests collected' output, causing LLMs to retry
with different parameters.

The patch adds:
- errors field in PytestCounts struct with Debug+PartialEq derives
- error detection in ===-wrapped and quiet summary lines
- error parsing in parse_summary_line (singular and plural)
- error count display in build_pytest_summary output
- Include errors in extras_present check to prevent early return
- 3 new tests: collection-error-only, mixed-errors, summary-parsing

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
- remove deprecated capability command modules

- add install stub and regroup top-level help

- drop enable-only execution policy wiring

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
jfeng18 and others added 23 commits June 23, 2026 11:37
Stop retrying after 3 consecutive failures to avoid repeated
1s timeouts in non-ECS environments. Addresses review on alibaba#783.
When a process execves multiple times (e.g., bash exec-ing into sleep),
the previous implementation only kept the last exec's args, losing the
original user-initiated command. This causes correlation issues for
tools like alibaba#1025 that need to match tool_call commands against execve
events.

Changes:
- AggregatedProcess::add_exec now preserves the first exec's args
  (the complete user command) and appends " ..." to mark subsequent
  execs occurred
- filename is still updated to reflect final exec state (backward compat)
- Added 4 unit tests covering single/multiple exec scenarios

Example:
- Before: python subprocess.run(['bash','-lc','echo X && sleep 1'])
  → audit args = "sleep 1" (last exec only)
- After: audit args = "bash -lc echo X && sleep 1 ..." (first + marker)

The " ..." marker indicates exec chain truncation and leaves room for
future enhancement (full exec chain tracking, deferred until needed).

Context: alibaba#1025 Phase 0 spike discovered this issue during tool_call↔execve
correlation research. ECS testing confirmed the fix preserves complete
args for Bash tool_call matching.

Related: alibaba#1025
Add ParsedApiMessage::request_metadata_session_id() and shared
session_id_from_metadata() helper to read session info from
Anthropic metadata. Use as highest-priority source for session_id
in both call_builder (normal) and builder (pending/crash) paths.
conversation_id remains hash-based (unchanged).

part of alibaba#1014
decide_sls_config_change called set_dynamic_logtail_path, a process-global
side-effect, contradicting its own enum doc ("side effects are carried out by
the thread shell so the decision logic stays pure"). Move the dynamic-path
update into handle_config_event's match arms (Deactivated/Activate/Reactivated);
decide now only performs the caller-owned sls_activated test-and-set.

Behavior-preserving: every SlsConfigAction variant leaves the global
DYNAMIC_LOGTAIL_PATH in the same final state, and sls_activated has no
production reader correlating it with the path, so the swap-vs-set reorder is
unobservable.

Tests: add a discriminating test that decide leaves the global path untouched
while handle sets it on activation and clears it on deactivation, plus assert
the overwrite on reactivation; all three handler set-sites and decide-purity are
mutation-covered. A mutex serializes the global-path tests.
Gracefully handle None/Err in unified.rs non-test code
with log::warn instead of panicking. Follow-up from alibaba#937.
- Update SkillFS core, CLI, and FUSE code to the POSIX baseline.
- Keep ANOLISA workspace metadata while importing checkpoint support.

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Add focused FUSE integration tests for the POSIX baseline.
- Add the pjdfstest wrapper and manifests for external validation.

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Document POSIX passthrough scope and external harness usage.
- Record mandatory fmt and clippy checks for SkillFS changes.

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Mirror the test-agentsight CI gates locally so PRs pass CI on the first attempt
instead of cycling through failures (the motivating case, alibaba#973, failed 3 rounds:
coverage, commit-lint, then fmt).

Part 1: enhance the agentsight-pr-body skill preflight (Step 1.6) with the
incremental-coverage gate and conventional-commit lint, alongside fmt/clippy.

Part 2: an opt-in, agent-agnostic git pre-push hook (make install-hooks) that
mirrors the CI hard gates; no-ops unless the branch touches src/agentsight/, and
gates coverage only under PREPUSH_COVERAGE=1.

Closes alibaba#974.
Prevent repair and user-layout tests from writing state through the
process HOME or XDG roots. Guard OpenClaw-related environment
mutations and avoid changing PATH in adapter manager tests.

Assisted-by: OpenAI Codex:gpt-5
Signed-off-by: 爱鲲 <jiawa.syx@alibaba-inc.com>
Harden the compressed-SSE decode path against three issues:

1. Decompression bomb: the decoders (gzip/deflate/zstd/brotli) had no output
   cap, so a crafted bomb from an observed, untrusted process could OOM the
   single privileged observer. Add MAX_DECOMPRESSED_LEN (32 MiB) via Read::take
   with a raw fallback; zstd moves off decode_all to a streaming Decoder. Also
   cap the in-flight compressed buffer (8 MiB) against a never-terminating stream.

2. Premature completion: scanning the compressed buffer for the chunk terminator
   could match by chance inside a compressed payload and finish early, truncating
   the body so decompression fails and the call is dropped. Detect completion via
   chunk framing (chunked_stream_complete) instead.

3. Drain data loss: drain_and_persist_dead_connections dropped compressed_buffer,
   so a compressed stream that died before completing (e.g. HTTP/2) lost its whole
   body. Decode it on drain via the shared decode_compressed_sse / drained_sse_events.

Tests: discriminating unit tests for the output cap, embedded-terminator
resistance, the compressed-buffer cap, the shared decoder, and the drain decode
decision.
- Surface skip-bootstrap guidance after install completes

- Document identity files can be adjusted later

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Extract L1 atomic facts from session audit logs via heuristic rules
when an MCP session ends (SIGTERM/ctrl_c). Zero LLM calls, pure
pattern matching on tool-call sequences.

Rules:
  - Working context: same-directory write patterns
  - Interest: search query extraction
  - Change: repeated edit / edit-then-read verification
  - Lesson: error pattern classification
  - Promoted: promote events as importance signals
  - Summary: session activity statistics

Storage:
  - facts/<ulid>.md: markdown with YAML frontmatter
  - facts/facts.jsonl: structured index for search
  - mem_consolidate MCP tool for manual trigger
  - ConsolidationConfig with env overrides

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
- daemon writes per-operation JSONL to /var/log/anolisa/sls/ops/ws-ckpt.jsonl
  with fields: component.name/version/agent_name, ops_id, ops_name,
  ckpt_time/roll_time/diff_time/list_time/ops_time, err_reason, supply
- detect caller identity via /proc/{peer_pid}/environ WS_CKPT_AGENT_NAME,
  whitelist to known agents (user/hermes/openclaw); env unset falls back
  to "user" (direct CLI), unknown values fall back to "unknown"
- ops_id uses timestamp_ms-pid-seq (AtomicU64)
- gate seccompiler behind cfg(target_os = "linux") so daemon crate compiles
  on non-Linux targets
- hermes/openclaw plugins set WS_CKPT_AGENT_NAME env when spawning CLI

Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
Add local JSONL-based session telemetry for SLS agent collection.
At session end, logSessionSummary writes a comprehensive record to
/var/log/anolisa/sls/ops/cosh.jsonl including:

- Component identification (name, version, agent_name)
- Session config (model, auth_type, approval_mode)
- Audit decision counts (approve/deny/modify)
- Tool call counts (total/success/fail) with duration
- Tool error classification (model_error/execution_error/denied)
- File operation stats (lines added/removed)
- Sandbox stats (runs/blocked)
- Token usage (input/output/cached/total)
- API stats (requests/errors/latency)
- Environment info (os.type, os.arch)
- add hermes plugin tests: config, checkpoint_manager, tools, __init__
  (161 tests, 91% coverage)
- add openclaw plugin tests: btrfs-manager, commands, config, handlers,
  environment-check, snapshot-store, state, whitelist
  (200 tests, 98% coverage)
- add rust unit tests for migration, lockfile, fs_watcher, state
  (17 tests covering previously-untested modules)
- exclude tests/ and .coverage from RPM in ws-ckpt.spec.in
- add coverage artifacts to .gitignore
- fix package-lock.json to use public npm registry

Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
- add cargo-tarpaulin rust coverage gate (>=45%) to test-ws-ckpt job
- add openclaw vitest coverage gate (>=90%) with vitest.config.ts
- add hermes pytest-cov coverage gate (>=90%)
- add btrfs loop e2e integration test exercising full CLI flow

Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
Move the /var/lib/ws-ckpt backup before registering the cleanup trap so early
failures cannot remove a real ws-ckpt state directory.

Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
- Resolve adapter resources from component contract dest for adopt flows
- Add structured skill sources with scoped datadir validation
- Keep convention discovery as fallback when no dest is declared

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
- Use Hermes plain plugin list output for status checks
- Parse OpenClaw rich tables with ANSI stripping and wrapped cells
- Cover false-negative and false-positive plugin detection cases

Signed-off-by: 空澈 <kongche.jbw@alibaba-inc.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
Signed-off-by: yizheng <YiZheng.Yang@linux.alibaba.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.