-
Notifications
You must be signed in to change notification settings - Fork 259
Added image management support for the env_process #4245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Introduces VT Agent, a new XML-RPC based agent for Avocado VT. This agent allows for remote execution of commands and services on a target system. It features a core RPC server, dynamic service loading, and basic API functions for agent control and logging. Key components include: - Main application entry point and argument parsing (`__main__.py`, `app/`). - Core server logic, service discovery, data/logging utilities (`core/`). - Example services (`services/examples/`) to demonstrate extensibility. This provides a foundation for more advanced remote interactions and test automation capabilities within Avocado VT. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
Signed-off-by: Yongxue Hong <[email protected]>
This commit introduces a new `vt_cluster` module to enable distributed, multi-host testing within the `virttest` framework. It provides the foundation for orchestrating complex test scenarios that span across multiple machines (nodes). The new module implements a controller-agent architecture: - A central controller orchestrates tests and manages the state of the cluster. - Remote agents run on test nodes, executing commands received via XML-RPC. Key components of this new feature include: - **`Cluster` (`__init__.py`):** A singleton object that manages the state of all nodes and partitions in the cluster. It persists the cluster state to a file. - **`Node` (`node.py`):** Represents a single machine (remote or controller). It handles agent setup, lifecycle management (start/stop), and log collection on remote nodes using SSH and SCP. - **`Partition` (`__init__.py`):** A logical grouping of nodes allocated for a specific test run, allowing for resource isolation. - **`_ClientProxy` (`proxy.py`):** An XML-RPC client proxy for communication between the controller and agents. This framework allows tests to request a partition of nodes, execute commands on them, and collect logs centrally, which is essential for multi-host migration tests and other distributed scenarios. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
Signed-off-by: Yongxue Hong <[email protected]>
This commit introduces a centralized logging server for the `vt_cluster` framework. The new `logger.py` module provides a `LoggerServer` that listens for log records from remote agents, enabling a unified view of events across the entire distributed environment. Key features: - A `LoggerServer` that runs on the controller node and collects logs from all registered nodes. - Log records are serialized to JSON for secure transmission over the network. - Each log message is tagged with its originating node name and IP address, providing a clear, chronological stream of logs from the entire cluster. This new logging mechanism simplifies debugging and monitoring of distributed tests by consolidating all log output into a single location. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
This commit introduces a framework for collecting and managing properties from all nodes within a virt test cluster. This provides a centralized and persistent way for tests to access node-specific hardware and software configurations. The new module handles the collection, caching, and retrieval of this data. On initialization, it gathers information from each node and stores it in a JSON file. As an initial implementation, the following metadata is collected: - Hostname - CPU vendor and model name To support this, a new agent service, has been added to expose CPU details from the worker nodes. This framework is designed to be extensible for gathering more properties in the future. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
This commit introduces a new node selection mechanism to the vt_cluster
framework.
The selector allows tests to dynamically request nodes based on specific
attributes, rather than relying on hardcoded node names. This enables more
flexible and resource-aware test scheduling.
Key components:
- selector.py: A new module containing the core selection logic.
- select_node(): The main function that filters a list of candidate nodes
based on a set of selector rules.
- _NodeSelector: A class that matches rules against node metadata.
- _MatchExpression & _Operator: Helper classes for parsing and executing
selection rules.
The selector works by querying properties from active agents on remote nodes.
It uses a simple expression format (key, operator, values) to define
requirements such as CPU vendor, memory size, or other custom metadata.
This change empowers tests to specify their hardware or software needs,
for example: "select a node where memory_gb >= 32"
Signed-off-by: Yongxue Hong <[email protected]>
Co-authored-by: Xu Han <[email protected]>
Co-authored-by: Zhenchao Liu <[email protected]>
Signed-off-by: Yongxue Hong <[email protected]>
This change introduces support for setting up a multi-host testing environment directly through the vt-bootstrap command. A new command-line option, --vt-cluster-config, has been added. This option accepts a path to a JSON file that defines the cluster topology, including the hosts and a central controller. The bootstrap process now includes steps to: - Parse the provided cluster configuration file. - Register each host as a node in the cluster, preparing its agent environment. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
This commit introduces the `VTCluster` plugin, designed to manage the lifecycle of a multi-node test environment within Avocado-VT. The plugin hooks into the job's pre-test and post-test phases to automate cluster management: - In `pre_tests`, it initializes the test environment by starting agent servers on all configured cluster nodes and loading necessary metadata. Setup failures are treated as fatal, terminating the job to prevent tests from running in a misconfigured environment. - In `post_tests`, it handles the teardown process. This includes stopping the agent servers, collecting their logs into a dedicated `cluster/` directory within the job results, and unloading metadata. The cleanup process is designed to be robust, logging failures without halting execution to ensure as much cleanup and log collection as possible is performed. Custom exceptions are included for clear error reporting, and the structure provides placeholders for future cluster manager logic. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
This commit introduces the core capability to run tests across multiple hosts. It extends the testing framework to manage a cluster of nodes, allowing tests to orchestrate and validate complex scenarios that involve distributed systems. Key features include: - A cluster management system for allocating and releasing nodes. - A mechanism for tests to request and interact with multiple remote machines. - Centralized log collection from all participating nodes to simplify debugging. - Robust setup and teardown logic to ensure a clean test environment. Signed-off-by: Yongxue Hong <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Zhenchao Liu <[email protected]>
|
Warning Rate limit exceeded@zhencliu has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (82)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
234ffa5 to
9c7352a
Compare
Introduced a comprehensive unified resource management system, establishing centralized coordination of test resources across cluster worker nodes. Master Node Resource Management: - PoolSelector with configurable criteria-based matching - ResourceManager: Central coordinator managing all pools and resources - ResourcePool: Collections of resources accessible by specific nodes - Resource: Individual assets (volumes, ports, etc.) within pools Worker Node Resource Backing management: - Resource service API: exposing operations via cluster node proxy - ResourceBackingManager: handling node-local resource implementations - ResourcePoolConnection: handling pool connectivity on worker nodes - ResourceBacking: Node-specific implementations of resources Signed-off-by: Zhenchao Liu <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Yongxue Hong <[email protected]>
bootstrap: Setup/cleanup resource manager. Register all resource pools configured by the user at boot strap. As avocado-vt doesn't support a an environment level cleanup, we need to do cleanup before the setup. vt_cluster: Startup/teardown the resource manager. Attach/detach all resource pools configured to their accessing nodes. Signed-off-by: Zhenchao Liu <[email protected]>
Signed-off-by: Zhenchao Liu <[email protected]>
A local filesystem storage pool can supply file-based volumes.
A filesystem pool can be attached(connected) from only one worker
node, users can configure a filesystem pool in the cluster.json:
"dir_pool1": {
"type": "filesystem",
"path": "/home/dirpool1",
"access": {
"nodes": ["host1"]
}
},
Required:
- type: filesystem
Optional:
- path: Use get_data_dir()/root_dir by default
- access: Use all worker nodes of the cluster by default, note
access.nodes must be set if there are more than one
nodes in the cluster.
If the filesystem pool is not configured in the cluster.json, create
it for each worker node by default.
Signed-off-by: Zhenchao Liu <[email protected]>
A nfs storage pool can supply a file-based volume resource.
A nfs pool can be attached(connected) from more then one worker
nodes, users can configure a nfs pool in the cluster.json:
"nfs_pool": {
"type": "nfs",
"server": "nfs-server-host",
"export": "/nfs/exported/dir",
"mount_point": {
"host1": "/var/tmp/mnt",
"host2": "/tmp/mnt"
},
"mount_options": {"*": "rw"},
"access": {
"nodes": ["host1", "host2"]
}
}
Required:
- type: nfs
- server: The nfs server hostname or ip
- export: The exported directory
Optional:
- mount_point: Use get_data_dir()/nfs_mnt/{server} by default
- mount_options: Use nfs's default options by default
- access: Use all worker nodes of the cluster by default
Signed-off-by: Zhenchao Liu <[email protected]>
Introduced the foundational image management infrastructure that enables hierarchical image handling with distributed storage coordination. Master Node Components: - LogicalImageManager: Coordinates complex image topologies and delegates storage operations to the unified resource management system - LogicalImage/Image abstractions: Define hierarchical image structures where each layer can be stored across different resource pools Worker Node Components: - ImageHandlerManager: Executes image operations (clone, update) on worker nodes - Image service interface: Provides RPC endpoints for distributed image operations Key Features: - Hierarchical Image Support: Enables complex image topologies, e.g. with backing chains for snapshots for the qemu image - Resource Integration: Images backed by volume resources managed through the unified resource system - Distributed Operations: Clone and update operations coordinated across cluster nodes - Extensible Architecture: Plugin-based design for different image types and formats Architecture Flow: LogicalImageManager → LogicalImage → Image → Volume Resource → Storage Backing The infrastructure provides the foundation for advanced image operations like snapshot management, image cloning, and distributed image access while maintaining tight integration with the cluster resource management system. Signed-off-by: Zhenchao Liu <[email protected]> Co-authored-by: Xu Han <[email protected]> Co-authored-by: Yongxue Hong <[email protected]>
Signed-off-by: Zhenchao Liu <[email protected]>
Signed-off-by: Zhenchao Liu <[email protected]>
When the cluster feature is enabled(checking if nodes is set), then use the image management system to handle images defined in 'images' param in the pre-/post- processes. Signed-off-by: Zhenchao Liu <[email protected]>
Depends on #4242