Epic: Multi-Host Testing Support in Avocado-VT

### Summary
This epic tracks the effort to introduce multi-host testing capabilities into the Avocado-VT framework. The goal is to extend the test scope from a single host to a multi-host cluster, enabling more complex and realistic test scenarios, such as live VM migration. This will be accomplished by creating a distributed architecture with a central Controller Node for test orchestration and multiple Worker Nodes for test execution.

### Motivation
Modern virtualization products are typically deployed in complex, multi-host cluster environments to meet customer demands for high availability and features like VM migration. Customers almost always perform these critical operations on remote hosts. Currently, the Avocado-VT framework lacks native support for testing across multiple hosts. The existing workaround—simulating migration on a single local host—is insufficient as it fails to accurately replicate the real-world conditions and potential issues of a distributed environment. This feature is critical to closing that gap and ensuring our testing reflects customer deployments.

### Key Goals
* Enable Cluster Testing: Allow test cases to execute seamlessly across multiple physical.
* Decouple Control and Execution: Separate test orchestration (control plane) from task execution (data plane) for a more robust and scalable architecture.
* Support Realistic Scenarios: Provide first-class support for testing critical multi-host features, starting with VM live migration.
* Improve Resource Efficiency: Optimize the use of distributed resources to increase test throughput and capacity.

### Proposed Architecture
The architecture is composed of a Controller Node, where the test logic runs, and one or more Worker Nodes, where the virtual machines are executed.
* Controller Node: A single machine that runs the Avocado-VT test process. It hosts the logical modules responsible for managing the cluster (vt_cluster), providing high-level VM APIs (vt_vmm), and managing shared resources (vt_resmgr). The test code makes direct, in-process Python calls to these modules.
* Worker Node: A remote machine that runs a lightweight vt_agent. The agent listens for commands from the Controller Node and performs actions locally, such as starting/stopping QEMU processes and managing local resources.

### Visual Architecture
_Component Diagram:_
This diagram shows that the test itself runs on the Controller Node, making direct API calls to the management libraries (vt_vmm, vt_resmgr), 
which in turn use the core vt_cluster module to orchestrate the remote Worker Nodes.
```ascii
+=================================================================+
|                      CONTROLLER NODE                            |
|                                                                 |
|   +---------------------------------------------------------+   |
|   |           Test Code (Avocado-VT Test Process)           |   |
|   +---------------------------------------------------------+   |
|                         |                                       |
|                         | (Direct, In-Process Python API Calls) |
|                         v                                       |
|   +-------------------------+      +--------------------------+ |
|   | vt_vmm                  |      | vt_resmgr                | |
|   | (VM Management Library) |      | (Resource Mgmt Library)  | |
|   +-------------------------+      +--------------------------+ |
|               |                            |                    |
|               +------------+---------------+                    |
|                            | (Internal Calls)                   |
|                            v                                    |
|   +-----------------------------------------------------------+ |
|   |                 vt_cluster (Core Controller)              | |
|   |-----------------------------------------------------------| |
|   | - Manages Agents, State, and Task Dispatch                | |
|   +-----------------------------------------------------------+ |
|                                                                 |
+=================================================================+
                          |                      |
            (Network Communication: RPC)         |
                          |                      |
       +------------------+----------------------+------------------+
       |                                         |                  |
       v                                         v                  v
+----------------------+           +----------------------+   +----------------------+
|    Worker Node 1     |           |    Worker Node 2     |   |    Worker Node N     |
|----------------------|           |----------------------|   |----------------------|
| +------------------+ |           | +------------------+ |   | +------------------+ |
| |   Worker Agent   | |           | |   Worker Agent   | |   | |   Worker Agent   | |
| |   (`vt_agent`)   | |           | |   (`vt_agent`)   | |   | |   (`vt_agent`)   | |
| +------------------+ |           | +------------------+ |   | +------------------+ |
| - Local Task Exec    |           | - Local Task Exec    |   | - Local Task Exec    |
| - VM Lifecycle Mgmt  |           | - VM Lifecycle Mgmt  |   | - VM Lifecycle Mgmt  |
+----------------------+           +----------------------+   +----------------------+
```
_Interaction Flow:_
A typical test operation, like creating a VM, would follow this flow:
1. A test calls the vt_vmm API to create a new VM.
2. vt_vmm requests a suitable host from the vt_cluster Controller.
3. vt_cluster selects an available worker node and reserves the required resources.
4. vt_vmm sends the "create VM" command to the vt_cluster Controller, targeting the selected node.
5. vt_cluster relays the command to the vt_agent on that node.
6. vt_agent executes the command locally to start the QEMU process.
7. vt_agent reports the status (success/failure, VM details) back to the vt_cluster Controller.

### Implementation Plan
This project is broken down into four core modules. Each module will be developed as a distinct component with a clear set of responsibilities:
* [ ] 1. `vt_agent` (Worker Agent Framework)
   * Objective: Develop a lightweight agent to run on each worker node, enabling remote command and control.
   * Key Deliverables:
       * A secure and reliable communication channel to the vt_cluster controller.
       * A set of local services for executing tasks (e.g., QMP, shell) as directed by the controller.
* [ ] 2. `vt_cluster` (Cluster Management Controller)
   * Objective: Create the central controller for managing the entire cluster of worker nodes.
   * Key Deliverables:
       * Node discovery, registration, and lifecycle management.
       * A foundational API for upper-layer modules to query and allocate nodes for tasks.
* [ ] 3. `vt_vmm` (Distributed Virtual Machine Manager)
   * Objective: Abstract VM operations across the cluster, providing a unified interface for managing distributed VMs.
   * Key Deliverables:
       * High-level API to create, inspect, migrate, and destroy VMs on worker nodes by interacting with vt_cluster.
       * Orchestration logic for multi-host operations, starting with VM live migration.
* [ ] 4. `vt_resmgr` (Distributed Resource Manager)
   * Objective: Manage resources, such as storage and networking, across the cluster.
   * Key Deliverables:
       * A unified resource management infrastructure to manage resources.
       * A service for managing NFS/Local Filesystem storage resources.
       * Integration with vt_vmm to ensure tests can allocate and use distributed resources seamlessly.
* [ ] 5. `vt_imgr` (Distributed Image Manager)
   * Objective: Manage images, such as qemu image, across the cluster.
   * Key Deliverables:
       * A unified image management infrastructure to manage images.
       * A service for managing qemu image, including high-level APIs to create/destroy/backup/restore/clone qemu images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Epic: Multi-Host Testing Support in Avocado-VT #4183

Summary

Motivation

Key Goals

Proposed Architecture

Visual Architecture

Implementation Plan

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Epic: Multi-Host Testing Support in Avocado-VT #4183

Description

Summary

Motivation

Key Goals

Proposed Architecture

Visual Architecture

Implementation Plan

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions