-
Notifications
You must be signed in to change notification settings - Fork 259
Description
Summary
This epic tracks the effort to introduce multi-host testing capabilities into the Avocado-VT framework. The goal is to extend the test scope from a single host to a multi-host cluster, enabling more complex and realistic test scenarios, such as live VM migration. This will be accomplished by creating a distributed architecture with a central Controller Node for test orchestration and multiple Worker Nodes for test execution.
Motivation
Modern virtualization products are typically deployed in complex, multi-host cluster environments to meet customer demands for high availability and features like VM migration. Customers almost always perform these critical operations on remote hosts. Currently, the Avocado-VT framework lacks native support for testing across multiple hosts. The existing workaround—simulating migration on a single local host—is insufficient as it fails to accurately replicate the real-world conditions and potential issues of a distributed environment. This feature is critical to closing that gap and ensuring our testing reflects customer deployments.
Key Goals
- Enable Cluster Testing: Allow test cases to execute seamlessly across multiple physical.
- Decouple Control and Execution: Separate test orchestration (control plane) from task execution (data plane) for a more robust and scalable architecture.
- Support Realistic Scenarios: Provide first-class support for testing critical multi-host features, starting with VM live migration.
- Improve Resource Efficiency: Optimize the use of distributed resources to increase test throughput and capacity.
Proposed Architecture
The architecture is composed of a Controller Node, where the test logic runs, and one or more Worker Nodes, where the virtual machines are executed.
- Controller Node: A single machine that runs the Avocado-VT test process. It hosts the logical modules responsible for managing the cluster (vt_cluster), providing high-level VM APIs (vt_vmm), and managing shared resources (vt_resmgr). The test code makes direct, in-process Python calls to these modules.
- Worker Node: A remote machine that runs a lightweight vt_agent. The agent listens for commands from the Controller Node and performs actions locally, such as starting/stopping QEMU processes and managing local resources.
Visual Architecture
Component Diagram:
This diagram shows that the test itself runs on the Controller Node, making direct API calls to the management libraries (vt_vmm, vt_resmgr),
which in turn use the core vt_cluster module to orchestrate the remote Worker Nodes.
+=================================================================+
| CONTROLLER NODE |
| |
| +---------------------------------------------------------+ |
| | Test Code (Avocado-VT Test Process) | |
| +---------------------------------------------------------+ |
| | |
| | (Direct, In-Process Python API Calls) |
| v |
| +-------------------------+ +--------------------------+ |
| | vt_vmm | | vt_resmgr | |
| | (VM Management Library) | | (Resource Mgmt Library) | |
| +-------------------------+ +--------------------------+ |
| | | |
| +------------+---------------+ |
| | (Internal Calls) |
| v |
| +-----------------------------------------------------------+ |
| | vt_cluster (Core Controller) | |
| |-----------------------------------------------------------| |
| | - Manages Agents, State, and Task Dispatch | |
| +-----------------------------------------------------------+ |
| |
+=================================================================+
| |
(Network Communication: RPC) |
| |
+------------------+----------------------+------------------+
| | |
v v v
+----------------------+ +----------------------+ +----------------------+
| Worker Node 1 | | Worker Node 2 | | Worker Node N |
|----------------------| |----------------------| |----------------------|
| +------------------+ | | +------------------+ | | +------------------+ |
| | Worker Agent | | | | Worker Agent | | | | Worker Agent | |
| | (`vt_agent`) | | | | (`vt_agent`) | | | | (`vt_agent`) | |
| +------------------+ | | +------------------+ | | +------------------+ |
| - Local Task Exec | | - Local Task Exec | | - Local Task Exec |
| - VM Lifecycle Mgmt | | - VM Lifecycle Mgmt | | - VM Lifecycle Mgmt |
+----------------------+ +----------------------+ +----------------------+
Interaction Flow:
A typical test operation, like creating a VM, would follow this flow:
- A test calls the vt_vmm API to create a new VM.
- vt_vmm requests a suitable host from the vt_cluster Controller.
- vt_cluster selects an available worker node and reserves the required resources.
- vt_vmm sends the "create VM" command to the vt_cluster Controller, targeting the selected node.
- vt_cluster relays the command to the vt_agent on that node.
- vt_agent executes the command locally to start the QEMU process.
- vt_agent reports the status (success/failure, VM details) back to the vt_cluster Controller.
Implementation Plan
This project is broken down into four core modules. Each module will be developed as a distinct component with a clear set of responsibilities:
- 1.
vt_agent(Worker Agent Framework)- Objective: Develop a lightweight agent to run on each worker node, enabling remote command and control.
- Key Deliverables:
- A secure and reliable communication channel to the vt_cluster controller.
- A set of local services for executing tasks (e.g., QMP, shell) as directed by the controller.
- 2.
vt_cluster(Cluster Management Controller)- Objective: Create the central controller for managing the entire cluster of worker nodes.
- Key Deliverables:
- Node discovery, registration, and lifecycle management.
- A foundational API for upper-layer modules to query and allocate nodes for tasks.
- 3.
vt_vmm(Distributed Virtual Machine Manager)- Objective: Abstract VM operations across the cluster, providing a unified interface for managing distributed VMs.
- Key Deliverables:
- High-level API to create, inspect, migrate, and destroy VMs on worker nodes by interacting with vt_cluster.
- Orchestration logic for multi-host operations, starting with VM live migration.
- 4.
vt_resmgr(Distributed Resource Manager)- Objective: Manage resources, such as storage and networking, across the cluster.
- Key Deliverables:
- A unified resource management infrastructure to manage resources.
- A service for managing NFS/Local Filesystem storage resources.
- Integration with vt_vmm to ensure tests can allocate and use distributed resources seamlessly.
- 5.
vt_imgr(Distributed Image Manager)- Objective: Manage images, such as qemu image, across the cluster.
- Key Deliverables:
- A unified image management infrastructure to manage images.
- A service for managing qemu image, including high-level APIs to create/destroy/backup/restore/clone qemu images.