Skip to content

Conversation

@jimmychiuuuu
Copy link

Summary

This PR implements support for Intel TDX and AMD SEV-SNP confidential computing devices, refactoring the plugin into a robust, multi-resource architecture. It also corrects the resource allocation logic for vTPM to ensure proper sharing.

Key Changes

  1. Multi-Device Architecture:
  • Refactored main.go to support multiple plugin instances running concurrently.
  • Added allDeviceSpecs to manage configuration for intel.com/tdx, amd.com/sev-snp, and google.com/cc.
  1. Resource Limits Correction:
  • Fixed vTPM Limit: Corrected google.com/cc device limit from 1 to 256 to correctly model it as a shared resource.
  • Set TDX/SEV-SNP limits to 1 (exclusive resource).
  1. Reliability & Hygiene:
  • Implemented robust socket cleanup using defer os.Remove(socketPath) to prevent EADDRINUSE errors during restarts.
  • Refactored hardcoded paths into constants for better maintainability.
  • Standardized allocation logging to level.Debug to reduce noise.
  1. Testing:
  • Added comprehensive unit tests covering multi-path discovery, ID generation, and multi-instance allocation.
  • Initialized Prometheus metrics in test helpers to prevent runtime panics.

Verification
Unit Tests: All passed (go test -v ./deviceplugin/...).

E2E Validation (on GKE Confidential Nodes):

Discovery: Validated kubectl get nodes reports intel.com/tdx: 1, amd.com/sev-snp: 1, and google.com/cc: 256.

Allocation: Deployed test pods (pod-tdx.yaml, pod-snp.yaml) and verified successful injection of /dev/tdx_guest and /dev/sev-guest.

Resilience: Manually deleted the device plugin pod; confirmed that running workloads remained healthy (0 restarts).

Copy link

@kongoshuu kongoshuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jimmy! This is a great PR! I took a first pass, will need to take another deep look, but just sending out for now to make sure I understand the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants