feat(osbase): sandbox install runc — zero-parameter full-stack install#1143
feat(osbase): sandbox install runc — zero-parameter full-stack install#1143WeissonHan wants to merge 1 commit into
Conversation
| } | ||
|
|
||
| // ─── Phase 5: State (stub) ─────────────────────────────────────────── | ||
| eprintln!( |
There was a problem hiding this comment.
[P1] The state phase is marked successful and prints that the sandbox was recorded as installed, but the implementation is still a stub and does not write any anolisa state. That is misleading for users and automation that expect status/list/uninstall to see this install. Please either implement the real state write here or mark this phase as skipped/stub with an honest message.
There was a problem hiding this comment.
This is still present in the latest revision. The code still emits state: recorded sandbox-... as installed and records a successful state phase while the phase is explicitly a stub. Please either implement the real state write, or change the user-visible output/phase status so it does not claim that anolisa state was recorded.
There was a problem hiding this comment.
Installation result will be updated in /var/lib/anolisa/installed.toml.
| @@ -268,7 +268,10 @@ fn handle_sandbox_install( | |||
| if outcome.exit_code != 0 { | |||
There was a problem hiding this comment.
[P1] This still returns CliError::Runtime for any non-zero outcome, including the new degraded verify path. That makes the command fail while the message says "install completed with warnings" and the core code treats verify failure as non-fatal. Please align the semantics either way: return success for degraded warnings, or make verify failure a real failed phase instead of PhaseStatus::Degraded.
There was a problem hiding this comment.
| Phase | Core behavior on failure | CLI behavior |
|---|---|---|
| Preflight | return Err(PhaseFailed{phase:"preflight", …}) |
Err(CliError) → non-zero exit |
| Packages (dnf) | return Err(PhaseFailed{phase:"packages", …}) |
Err(CliError) → non-zero exit |
| Services (systemctl) | return Err(PhaseFailed{phase:"services", …}) |
Err(CliError) → non-zero exit |
| Verify (docker info) | warnings stored in outcome, Ok(outcome) |
Ok(()) → exit 0 + print warning count |
new behaviors are as above.
|
|
||
| // L4: smoke test | ||
| if !skip_verify { | ||
| eprintln!("[osbase] verify: docker run --rm alpine echo hello"); |
There was a problem hiding this comment.
[P2] This smoke test may require pulling alpine from the network on a fresh machine, so a mirror/network/rate-limit issue can look like an install failure even when runc/containerd/docker are installed correctly. If this remains part of default verification, it should stay warning-only and must not cause the CLI to report the install command as failed.
There was a problem hiding this comment.
updated as daemon health check
974124d to
90f9396
Compare
| let output = Command::new(cmd) | ||
| .args(args) | ||
| .output() | ||
| .map_err(|e| { |
There was a problem hiding this comment.
[P1] CI is currently failing at cargo fmt --all --check on this block. Running cargo fmt --all should collapse this map_err into the single-line form expected by rustfmt.
| let stderr = String::from_utf8_lossy(&output.stderr); | ||
| let hint = stderr.lines().next().unwrap_or("").trim(); | ||
| Err(format!( | ||
| "{label} failed (exit {}). Try: systemctl status {cmd}. {hint}", |
There was a problem hiding this comment.
[P2] This generic hint is inaccurate for several verification commands. For example, a runc --version failure suggests systemctl status runc, and a systemctl is-active containerd failure suggests systemctl status systemctl. Please either make the hint command-specific or drop the fixed systemctl status {cmd} hint and surface stderr only.
There was a problem hiding this comment.
hints appears as:
- runc --version failed (exit 127): No such file or directory
- containerd active failed (exit 3): inactive
- docker info failed (exit 1): Cannot connect to the Docker daemon
non-sense messages like systemctl status runc are removed.
|
One PR-description note: the implementation now verifies Docker with |
| let layout = FsLayout::system(None); | ||
| let state_path = layout.state_dir.join("installed.toml"); | ||
| let state_result = (|| -> Result<String, String> { | ||
| let mut state = |
There was a problem hiding this comment.
[P1] This state write path loads and saves /var/lib/anolisa/installed.toml without holding InstallLock. The repository lock contract says every writer touching installed.toml must hold the install lock; otherwise concurrent install/update/adapter operations can lose each other’s state updates. Please acquire layout.lock_file before loading state and keep the load/upsert/save sequence inside the lock.
There was a problem hiding this comment.
acquires InstallLock before touching installed.toml, wrapping the entire load/upsert/save sequence inside the lock:
- Lock acquired before state load — mirrors the pattern in lifecycle.rs and install.rs
- Entire load → upsert → save runs under the lock — _lock drops (releases) when the closure returns
- Lock contention degrades gracefully — returns a clear message ("install lock at … is held by another process") and the phase reports Degraded (non-fatal, exit 2)
- Non-blocking semantics preserved — try_lock_exclusive fails immediately, never blocks the pipeline
| }) | ||
| } else { | ||
| Ok(()) | ||
| // exit_code 2 = degraded (non-fatal warnings already |
There was a problem hiding this comment.
[P1] This degraded-success behavior is only applied on the direct execution path. The helper path still goes through send_helper_request, which treats any HelperResponse::Success with exit_code != 0 as CliError::Runtime; meanwhile the helper server returns outcome.exit_code unchanged. As a result, the same degraded verify/state outcome succeeds when run directly as root but fails when routed through system-helper. Please apply the same degraded-success handling to the helper response path.
There was a problem hiding this comment.
match resp {
HelperResponse::Success { message, exit_code } => {
- if exit_code == 0 {
+ if exit_code == 0 || exit_code == 2 {
+ // exit_code 0 = full success, 2 = degraded (non-fatal
+ // verify/state warnings). Both are reported as success;
+ // the core already printed diagnostics to stderr.
for line in message.lines() {
eprintln!("[osbase] {line}");
}
+ if exit_code == 2 {
+ eprintln!("[osbase] install completed with degraded status (non-fatal)");
+ }
Ok(())
} else {
Err(CliError::Runtime {
The fix treats exit_code == 2 (degraded) the same as exit_code == 0 in the helper response path:
- Both print the helper's message lines to stderr and return Ok(())
- Degraded additionally prints a "non-fatal" notice so the user knows warnings occurred
- Only exit_code >= 3 (or 1) is now treated as a hard failure via CliError::Runtime
This makes the helper path consistent with the direct execution path where verify/state degradation never fails the CLI.
fixed |
d3ce5a2 to
39a4ba2
Compare
|
Thanks for the latest round of fixes. I did one more full pass on
|
1. State metadata is now system-scopedInstalledState::load() defaults to install_mode = User and an empty prefix. The previous code never overrode these fields, so the osbase state record was incorrectly tagged as user-scope.Fix: Before calling upsert_object, explicitly set system scope — matching the pattern used in tier1/install.rs:
2. InstallLock now covers the full mutation windowPreviously, the lock was acquired only inside the Phase 5 closure (state write). This meant concurrent anolisa processes could overlap on dnf install and systemctl enable --now, racing on package/service mutations.Fix: Moved InstallLock::acquire() to immediately after Preflight (which is read-only) and before Phase 2 (Packages). The _lock binding lives until run_manifest_install() returns, so the lock spans packages → services → verify → state: Phase 5 now simply says // Lock is already held (acquired before Phase 2).
3. State persistence failure is a hard errorIf state save fails after packages are installed and services started, the machine is modified with no anolisa record — an unrecoverable inconsistency. The previous code treated this as Degraded (exit_code = 2 = success to the CLI).Fix: Changed from PhaseStatus::Degraded (non-fatal warning) to PhaseStatus::Failed + return Err(...):
4. "Smoke test" wording replaced with "daemon health verification"The implementation runs docker info (daemon reachability check), not docker run (container lifecycle). Updated all references: |
|
Latest pass on One remaining UX/consistency point: system-helper output still does not surface the full five-phase result. The non-privileged path confirmed in the screenshot goes through Since |
|
Latest pass on
|
1. Verify Hard-Coded to runc/Docker Stack
Approach: Verification commands should be declared by the scenario itself, not hard-coded in Rust.
The result: each scenario only verifies its own components, eliminating false degraded reports for unrelated binaries. 2. Helper Response Does Not Reflect Actual Five-Phase Outcome
Approach: Since outcome.phases already captures every phase's real execution status, the helper should serialize it directly rather than fabricate a parallel narrative.
3. Optional Package Hints Counted as Warnings
Approach: Introduce a semantic separation between warnings (real anomalies that may warrant attention) and hints (purely informational messages that do not indicate any degradation).
The effect: a fully successful runc install no longer displays any warning count — the user only sees an informational hint about nerdctl at the end of the output. |
this was already addressed in the initial commit (859c253) where dispatch_osbase_install now renders from outcome.phases rather than reconstructing from manifest metadata. The follow-up here extracts the formatting into a standalone testable function and adds the two helper-path unit tests you requested. 1. Extract format_outcome_response() as a standalone functionThe inline formatting block is now a named function, making the logic testable without requiring root (the daemon's execute_install validates uid=0):2. Test: successful runc install surfaces all five phases + hints3. Test: degraded verify produces warning + exit_code 2Test results: all 6 daemon_server tests pass (including the 2 new ones), full anolisa-core + anolisa-cli test suites green. |
ikunkun-sys
left a comment
There was a problem hiding this comment.
Latest pass on 0684284b, two remaining issues:
-
Dry-run is still reported as a warning.
build_dry_run_outcome()returnswarnings: ["dry-run mode: no changes made"], so both direct execution and the system-helper path report dry-run as a warning. This keeps producinginstall completed with 1 warning(s)/warning: dry-run mode...even though dry-run is an intentional mode, not a degraded or anomalous outcome. -
A skipped verify path is recorded as verify success.
For scenarios with no
verify_commandsand noservices,run_post_verify()returns the messageno verify commands defined; skipped, but the caller records the phase asPhaseStatus::Success. For example,landlocknow reports a successful verify phase while the phase message says it was skipped, so the structured phase status and user-facing message disagree.
1. Dry-run no longer a warning
Dry-run is an intentional mode, not an anomaly. Moved to hints so the CLI won't print "install completed with 1 warning(s)" for dry-run invocations. 2. Verify "nothing to verify" records as Skipped, not SuccessFor scenarios like landlock where neither verify_commands nor services are defined, run_post_verify() returns "no verify commands defined; skipped". The phase is now recorded as PhaseStatus::Skipped — making the structured status consistent with the user-facing message. |
After running:
anolisa osbase sandbox install runc
the user gets a working Docker environment end-to-end:
docker info ✓
What this delivers:
- Full-stack install: runc + containerd + docker + docker-client installed
as a unit; services automatically enabled and started.
- Post-install verification: confirms runc binary, containerd service,
docker daemon, and a live container health check.
- Optional package hints: informs user about available extras (e.g. nerdctl)
without forcing installation.
- Manifest-driven: scenario definitions live in sandbox.toml; adding a new
scenario requires only a TOML stanza, no code changes.
Signed-off-by: Weisson <Weisson@linux.alibaba.com>
After running:
anolisa osbase sandbox install runc
the user gets a working Docker environment end-to-end:
docker info ✓
What this delivers:
Description
Implement end-to-end
anolisa osbase sandbox install runccommand thatdelivers a working Docker environment with zero parameters. The install
pipeline reads scenario definitions from
sandbox.tomland executes fivephases: preflight → packages → services → verify → state.
Previously the install path was a 3-step stub (preflight → packages → hint).
This PR upgrades it to a full pipeline that actually enables services and
runs a health check to confirm the environment works.
Related Issue
closes #
Type of Change
Scope
cosh(copilot-shell)sec-core(agent-sec-core)skill(os-skills)sight(agentsight)tokenless(tokenless)ckpt(ws-ckpt)memory(agent-memory)anolisa(anolisa-cli)skillfs(SkillFS)Checklist
cosh: Lint passes, type check passes, and tests passsec-core(Rust):cargo clippy -- -D warningsandcargo fmt --checkpasssec-core(Python): Ruff format and pytest passskill: Skill directory structure is valid and shell scripts pass syntax checksight:cargo clippy -- -D warningsandcargo fmt --checkpasstokenless:cargo clippy -- -D warningsandcargo fmt --checkpassmemory(Linux only):cargo clippy --all-targets -- -D warnings,cargo fmt --check, andcargo testpassanolisa:cargo clippy --all-targets --locked -- -D warnings,cargo fmt --all --check, andcargo test --lockedpassskillfs:cargo fmt --all --check,cargo clippy --workspace --all-targets -- -D warnings, andcargo test --workspacepasspackage-lock.json/Cargo.lock)Testing
cargo test --locked— 948 tests passed, 0 failedservicesfield), preflightchecks, request validation, and phase result aggregation
anolisa osbase sandbox install runcfollowed by
docker infosucceedsAdditional Notes
The
servicesfield is a new addition tosandbox.tomlscenario schema.Existing scenarios without
servicesdefault to an empty list (phase isskipped), so this is fully backward-compatible.