Added image management support for the env_process #4245

zhencliu · 2025-10-05T11:17:28Z

Depends on #4242

Introduces VT Agent, a new XML-RPC based agent for Avocado VT. This agent allows for remote execution of commands and services on a target system. It features a core RPC server, dynamic service loading, and basic API functions for agent control and logging. Key components include: - Main application entry point and argument parsing (`__main__.py`, `app/`). - Core server logic, service discovery, data/logging utilities (`core/`). - Example services (`services/examples/`) to demonstrate extensibility. This provides a foundation for more advanced remote interactions and test automation capabilities within Avocado VT. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

Signed-off-by: Yongxue Hong <[email protected]>

This commit introduces a new `vt_cluster` module to enable distributed, multi-host testing within the `virttest` framework. It provides the foundation for orchestrating complex test scenarios that span across multiple machines (nodes). The new module implements a controller-agent architecture: - A central controller orchestrates tests and manages the state of the cluster. - Remote agents run on test nodes, executing commands received via XML-RPC. Key components of this new feature include: - **`Cluster` (`__init__.py`):** A singleton object that manages the state of all nodes and partitions in the cluster. It persists the cluster state to a file. - **`Node` (`node.py`):** Represents a single machine (remote or controller). It handles agent setup, lifecycle management (start/stop), and log collection on remote nodes using SSH and SCP. - **`Partition` (`__init__.py`):** A logical grouping of nodes allocated for a specific test run, allowing for resource isolation. - **`_ClientProxy` (`proxy.py`):** An XML-RPC client proxy for communication between the controller and agents. This framework allows tests to request a partition of nodes, execute commands on them, and collect logs centrally, which is essential for multi-host migration tests and other distributed scenarios. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

Signed-off-by: Yongxue Hong <[email protected]>

This commit introduces a centralized logging server for the `vt_cluster` framework. The new `logger.py` module provides a `LoggerServer` that listens for log records from remote agents, enabling a unified view of events across the entire distributed environment. Key features: - A `LoggerServer` that runs on the controller node and collects logs from all registered nodes. - Log records are serialized to JSON for secure transmission over the network. - Each log message is tagged with its originating node name and IP address, providing a clear, chronological stream of logs from the entire cluster. This new logging mechanism simplifies debugging and monitoring of distributed tests by consolidating all log output into a single location. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

This commit introduces a framework for collecting and managing properties from all nodes within a virt test cluster. This provides a centralized and persistent way for tests to access node-specific hardware and software configurations. The new module handles the collection, caching, and retrieval of this data. On initialization, it gathers information from each node and stores it in a JSON file. As an initial implementation, the following metadata is collected: - Hostname - CPU vendor and model name To support this, a new agent service, has been added to expose CPU details from the worker nodes. This framework is designed to be extensible for gathering more properties in the future. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

This commit introduces a new node selection mechanism to the vt_cluster framework. The selector allows tests to dynamically request nodes based on specific attributes, rather than relying on hardcoded node names. This enables more flexible and resource-aware test scheduling. Key components: - selector.py: A new module containing the core selection logic. - select_node(): The main function that filters a list of candidate nodes based on a set of selector rules. - _NodeSelector: A class that matches rules against node metadata. - _MatchExpression & _Operator: Helper classes for parsing and executing selection rules. The selector works by querying properties from active agents on remote nodes. It uses a simple expression format (key, operator, values) to define requirements such as CPU vendor, memory size, or other custom metadata. This change empowers tests to specify their hardware or software needs, for example: "select a node where memory_gb >= 32" Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

Signed-off-by: Yongxue Hong <[email protected]>

This change introduces support for setting up a multi-host testing environment directly through the vt-bootstrap command. A new command-line option, --vt-cluster-config, has been added. This option accepts a path to a JSON file that defines the cluster topology, including the hosts and a central controller. The bootstrap process now includes steps to: - Parse the provided cluster configuration file. - Register each host as a node in the cluster, preparing its agent environment. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

This commit introduces the `VTCluster` plugin, designed to manage the lifecycle of a multi-node test environment within Avocado-VT. The plugin hooks into the job's pre-test and post-test phases to automate cluster management: - In `pre_tests`, it initializes the test environment by starting agent servers on all configured cluster nodes and loading necessary metadata. Setup failures are treated as fatal, terminating the job to prevent tests from running in a misconfigured environment. - In `post_tests`, it handles the teardown process. This includes stopping the agent servers, collecting their logs into a dedicated `cluster/` directory within the job results, and unloading metadata. The cleanup process is designed to be robust, logging failures without halting execution to ensure as much cleanup and log collection as possible is performed. Custom exceptions are included for clear error reporting, and the structure provides placeholders for future cluster manager logic. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

This commit introduces the core capability to run tests across multiple hosts. It extends the testing framework to manage a cluster of nodes, allowing tests to orchestrate and validate complex scenarios that involve distributed systems. Key features include: - A cluster management system for allocating and releasing nodes. - A mechanism for tests to request and interact with multiple remote machines. - Centralized log collection from all participating nodes to simplify debugging. - Robust setup and teardown logic to ensure a clean test environment. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>

coderabbitai · 2025-10-05T11:17:54Z

Warning

Rate limit exceeded

@zhencliu has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 72943c1 and 58b6738.

📒 Files selected for processing (82)

avocado_vt/plugins/vt_bootstrap.py (1 hunks)
avocado_vt/plugins/vt_cluster.py (1 hunks)
avocado_vt/test.py (6 hunks)
avocado_vt/vt_agent/pyproject.toml (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/README.md (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/__main__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/app/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/app/args.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/app/cmd.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/data_dir.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/logger.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/rpc/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/rpc/server.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/core/rpc/service.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/connect.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/console.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/image_manager.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/images/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/images/qemu/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/images/qemu/qemu_image_handlers.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backing_manager.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/backing.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/pool_connection.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/dir/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/dir/dir_pool_connection.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/dir/dir_volume_backing.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/file_volume_backing.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/nfs/__init__.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/nfs/nfs_pool_connection.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/nfs/nfs_volume_backing.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/managers/resource_backings/storage/volume_backing.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/core.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/examples/hello.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/host/cpu.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/host/platform.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/image.py (1 hunks)
avocado_vt/vt_agent/src/avocado_vt/agent/services/resource.py (1 hunks)
setup.py (1 hunks)
spell.ignore (8 hunks)
virttest/bootstrap.py (5 hunks)
virttest/env_process.py (15 hunks)
virttest/vt_cluster/README.md (1 hunks)
virttest/vt_cluster/__init__.py (1 hunks)
virttest/vt_cluster/logger.py (1 hunks)
virttest/vt_cluster/node.py (1 hunks)
virttest/vt_cluster/node_properties.py (1 hunks)
virttest/vt_cluster/proxy.py (1 hunks)
virttest/vt_cluster/selector.py (1 hunks)
virttest/vt_imgr/__init__.py (1 hunks)
virttest/vt_imgr/logical_image_manager.py (1 hunks)
virttest/vt_imgr/logical_images/__init__.py (1 hunks)
virttest/vt_imgr/logical_images/layer_image.py (1 hunks)
virttest/vt_imgr/logical_images/logical_image.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/__init__.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/images/__init__.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/images/luks_qemu_layer_image.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/images/qcow2_qemu_layer_image.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/images/raw_qemu_layer_image.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/qemu_layer_image.py (1 hunks)
virttest/vt_imgr/logical_images/qemu/qemu_logical_image.py (1 hunks)
virttest/vt_resmgr/__init__.py (1 hunks)
virttest/vt_resmgr/resource_manager.py (1 hunks)
virttest/vt_resmgr/resources/__init__.py (1 hunks)
virttest/vt_resmgr/resources/pool.py (1 hunks)
virttest/vt_resmgr/resources/pool_selector.py (1 hunks)
virttest/vt_resmgr/resources/resource.py (1 hunks)
virttest/vt_resmgr/resources/storage/__init__.py (1 hunks)
virttest/vt_resmgr/resources/storage/block_volume.py (1 hunks)
virttest/vt_resmgr/resources/storage/dir/__init__.py (1 hunks)
virttest/vt_resmgr/resources/storage/dir/dir_pool.py (1 hunks)
virttest/vt_resmgr/resources/storage/dir/dir_volume.py (1 hunks)
virttest/vt_resmgr/resources/storage/file_volume.py (1 hunks)
virttest/vt_resmgr/resources/storage/net_volume.py (1 hunks)
virttest/vt_resmgr/resources/storage/nfs/__init__.py (1 hunks)
virttest/vt_resmgr/resources/storage/nfs/nfs_pool.py (1 hunks)
virttest/vt_resmgr/resources/storage/nfs/nfs_volume.py (1 hunks)
virttest/vt_resmgr/resources/storage/volume.py (1 hunks)
virttest/vt_utils/image/qemu.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Introduced a comprehensive unified resource management system, establishing centralized coordination of test resources across cluster worker nodes. Master Node Resource Management: - PoolSelector with configurable criteria-based matching - ResourceManager: Central coordinator managing all pools and resources - ResourcePool: Collections of resources accessible by specific nodes - Resource: Individual assets (volumes, ports, etc.) within pools Worker Node Resource Backing management: - Resource service API: exposing operations via cluster node proxy - ResourceBackingManager: handling node-local resource implementations - ResourcePoolConnection: handling pool connectivity on worker nodes - ResourceBacking: Node-specific implementations of resources Signed-off-by: Zhenchao Liu <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Yongxue Hong <[email protected]>

bootstrap: Setup/cleanup resource manager. Register all resource pools configured by the user at boot strap. As avocado-vt doesn't support a an environment level cleanup, we need to do cleanup before the setup. vt_cluster: Startup/teardown the resource manager. Attach/detach all resource pools configured to their accessing nodes. Signed-off-by: Zhenchao Liu <[email protected]>

Signed-off-by: Zhenchao Liu <[email protected]>

A local filesystem storage pool can supply file-based volumes. A filesystem pool can be attached(connected) from only one worker node, users can configure a filesystem pool in the cluster.json: "dir_pool1": { "type": "filesystem", "path": "/home/dirpool1", "access": { "nodes": ["host1"] } }, Required: - type: filesystem Optional: - path: Use get_data_dir()/root_dir by default - access: Use all worker nodes of the cluster by default, note access.nodes must be set if there are more than one nodes in the cluster. If the filesystem pool is not configured in the cluster.json, create it for each worker node by default. Signed-off-by: Zhenchao Liu <[email protected]>

A nfs storage pool can supply a file-based volume resource. A nfs pool can be attached(connected) from more then one worker nodes, users can configure a nfs pool in the cluster.json: "nfs_pool": { "type": "nfs", "server": "nfs-server-host", "export": "/nfs/exported/dir", "mount_point": { "host1": "/var/tmp/mnt", "host2": "/tmp/mnt" }, "mount_options": {"*": "rw"}, "access": { "nodes": ["host1", "host2"] } } Required: - type: nfs - server: The nfs server hostname or ip - export: The exported directory Optional: - mount_point: Use get_data_dir()/nfs_mnt/{server} by default - mount_options: Use nfs's default options by default - access: Use all worker nodes of the cluster by default Signed-off-by: Zhenchao Liu <[email protected]>

Introduced the foundational image management infrastructure that enables hierarchical image handling with distributed storage coordination. Master Node Components: - LogicalImageManager: Coordinates complex image topologies and delegates storage operations to the unified resource management system - LogicalImage/Image abstractions: Define hierarchical image structures where each layer can be stored across different resource pools Worker Node Components: - ImageHandlerManager: Executes image operations (clone, update) on worker nodes - Image service interface: Provides RPC endpoints for distributed image operations Key Features: - Hierarchical Image Support: Enables complex image topologies, e.g. with backing chains for snapshots for the qemu image - Resource Integration: Images backed by volume resources managed through the unified resource system - Distributed Operations: Clone and update operations coordinated across cluster nodes - Extensible Architecture: Plugin-based design for different image types and formats Architecture Flow: LogicalImageManager → LogicalImage → Image → Volume Resource → Storage Backing The infrastructure provides the foundation for advanced image operations like snapshot management, image cloning, and distributed image access while maintaining tight integration with the cluster resource management system. Signed-off-by: Zhenchao Liu <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Yongxue Hong <[email protected]>

Signed-off-by: Zhenchao Liu <[email protected]>

When the cluster feature is enabled(checking if nodes is set), then use the image management system to handle images defined in 'images' param in the pre-/post- processes. Signed-off-by: Zhenchao Liu <[email protected]>

YongxueHong and others added 11 commits September 24, 2025 16:57

spell.ignore: Update the spell.ignore list

2a8c8bd

Signed-off-by: Yongxue Hong <[email protected]>

spell.ignore: Update the spell.ignore list

0d9037c

Signed-off-by: Yongxue Hong <[email protected]>

spell.ignore: Update the spell.ignore list

350d09f

Signed-off-by: Yongxue Hong <[email protected]>

zhencliu force-pushed the env_setup branch 13 times, most recently from 234ffa5 to 9c7352a Compare October 6, 2025 16:12

zhencliu and others added 5 commits October 7, 2025 17:15

Ignored some spelling errors

b572cb2

Signed-off-by: Zhenchao Liu <[email protected]>

zhencliu and others added 4 commits October 7, 2025 17:20

vt_cluster: Startup/Teardown the image manager

81e2c5e

Signed-off-by: Zhenchao Liu <[email protected]>

vt_imgr: Added qemu image management support

20d20a9

Signed-off-by: Zhenchao Liu <[email protected]>

env_process: Added image management support

58b6738

When the cluster feature is enabled(checking if nodes is set), then use the image management system to handle images defined in 'images' param in the pre-/post- processes. Signed-off-by: Zhenchao Liu <[email protected]>

zhencliu force-pushed the env_setup branch from 9c7352a to 58b6738 Compare October 7, 2025 09:22

zhencliu marked this pull request as ready for review October 7, 2025 09:25

zhencliu mentioned this pull request Oct 9, 2025

Tracking Issue: Review PRs for Multi-Host Testing Support in Avocado-VT #4209

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added image management support for the env_process #4245

Added image management support for the env_process #4245

Uh oh!

zhencliu commented Oct 5, 2025

Uh oh!

coderabbitai bot commented Oct 5, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added image management support for the env_process #4245

Are you sure you want to change the base?

Added image management support for the env_process #4245

Uh oh!

Conversation

zhencliu commented Oct 5, 2025

Uh oh!

coderabbitai bot commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Oct 5, 2025 •

edited

Loading