Skip to content

[Bug] - NVIDIA CUDA proprietary driver stopped working in 2023.10.20260105 on 4 GB root volume #1054

@injust

Description

@injust

Describe the bug

When an AMI is created in EC2 Image Builder that installs the NVIDIA CUDA driver, the driver does not work. nvidia-smi outputs this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I haven't been able to reproduce outside of EC2 Image Builder.

To Reproduce

Steps to reproduce the behavior:

  1. Create an EC2 Image Builder pipeline using base image ssm:/aws/service/ami-amazon-linux-latest/al2023-ami-minimal-kernel-6.12-x86_64 and a 4 GB EBS volume
  2. Add the update-linux build component
  3. Add this build component, which installs the NVIDIA driver and the minimum CUDA 13.1 components needed for hashcat:
# https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/amazon-linux.html

schemaVersion: 1.0

phases:
  - name: build
    steps:
      - name: DnfAddRepo
        action: ExecuteBash
        inputs:
          commands:
            - dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/cuda-amzn2023.repo

      - name: DnfInstall
        action: ExecuteBash
        inputs:
          commands:
            - dnf -y module enable nvidia-driver:latest-dkms
            - dnf -y install cuda-cudart-13-1 cuda-nvrtc-devel-13-1 nvidia-driver-cuda
  1. Build an AMI
  2. Launch any GPU instance from the AMI and run nvidia-smi

Expected behavior

The NVIDIA driver should function.

Additional context

al2023-ami-minimal-2023.9.20251208.0-kernel-6.12-x86_64: Latest good version
al2023-ami-minimal-2023.10.20260105.0-kernel-6.12-x86_64: Known bad version

Metadata

Metadata

Assignees

No one assigned

    Labels

    nvidiaNVIDIA related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions