Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Dec 18, 2025

Summary

Adds a version compatibility check that warns users when cuda-bindings was compiled against a newer CUDA major version than the installed driver supports.

Changes

cuda-bindings

  • Added check_cuda_version_compatibility() function in cuda/bindings/utils/_version_check.py
  • Compares compile-time CUDA_VERSION vs runtime cuDriverGetVersion()
  • Exported from cuda.bindings.utils
  • Added comprehensive unit tests in tests/test_version_check.py

cuda-core

  • Device.__new__ calls check_cuda_version_compatibility() after cuInit succeeds
  • Imports the function directly from cuda.bindings.utils

Rationale

When cuda-bindings is built against CUDA 13 headers but the user's driver only supports CUDA 12, many features will silently fail or behave unexpectedly. This check provides early, clear feedback:

UserWarning: cuda-bindings was built against CUDA 13.0, but the installed driver 
only supports CUDA 12.8. Some features may not work correctly. Consider updating 
your NVIDIA driver. Set CUDA_PYTHON_DISABLE_VERSION_CHECK=1 to suppress this warning.

Design

  • Provided by cuda-bindings: The version check implementation lives in cuda.bindings.utils since it checks cuda-bindings' compile-time version
  • Invoked by cuda-core: Called when Device first triggers CUDA initialization
  • Runs once: Uses a module-level flag to ensure the check runs only once per process
  • Non-blocking: Warning only, does not prevent operation
  • Suppressible: Set CUDA_PYTHON_DISABLE_VERSION_CHECK=1 to disable
  • Major version only: No warning for minor version differences (handled by graceful degradation per PR Handle unsupported device attributes gracefully #1409)

Future Work

We could not find a suitable place to invoke the version check automatically within cuda-bindings itself (e.g., hooking into cuInit), so the check is currently triggered by cuda-core. This may be revisited in the future.

Test Coverage

7 tests in cuda-bindings covering:

  • No warning when driver is newer
  • No warning when same major version
  • Warning when compile major > driver major
  • Warning only issued once per process
  • Suppression via environment variable
  • Silent handling of driver errors
  • Silent handling of missing CUDA_VERSION attribute

Related Work

@Andy-Jost Andy-Jost added this to the cuda.core beta 11 milestone Dec 18, 2025
@Andy-Jost Andy-Jost added P0 High priority - Must do! feature New feature or request cuda.core Everything related to the cuda.core module labels Dec 18, 2025
@Andy-Jost Andy-Jost self-assigned this Dec 18, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 7ce325c

@Andy-Jost Andy-Jost force-pushed the runtime-version-check branch from 7ce325c to 1962e35 Compare December 18, 2025 19:08
@Andy-Jost
Copy link
Contributor Author

/ok to test 1962e35

@github-actions
Copy link

Comment on lines 345 to 346
f"but the installed driver only supports CUDA {runtime_major}.{runtime_minor}. "
f"Some features may not work correctly. Consider updating your NVIDIA driver. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is nerve wrecking considering that we support CUDA Minor Version Compatibility. In a few cases, we check the minor version in cuda-core to ensure what we need is available at run time, but only when we actually need them. I think we should rephrase this warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is only issued in the case of a MAJOR version mismatch.

@Andy-Jost Andy-Jost force-pushed the runtime-version-check branch 2 times, most recently from 8a0d248 to 73e5a43 Compare December 19, 2025 00:09
@Andy-Jost
Copy link
Contributor Author

/ok to test 73e5a43

Warn when cuda-bindings was compiled against a newer CUDA major version
than the installed driver supports. This helps users understand why
certain features may not work correctly.

The check runs once after cuInit and can be suppressed via
CUDA_PYTHON_DISABLE_VERSION_CHECK=1.
@Andy-Jost Andy-Jost force-pushed the runtime-version-check branch from 73e5a43 to ddd92bd Compare December 19, 2025 00:47
@Andy-Jost
Copy link
Contributor Author

/ok to test ddd92bd

@leofang leofang self-requested a review December 19, 2025 01:22
# Get runtime driver version
err, runtime_version = driver.cuDriverGetVersion()
if err != driver.CUresult.CUDA_SUCCESS:
return # Can't check, skip silently
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being able to query from the driver version is worthy of a warning to the user instead of silently eating it.

)


def _reset_version_compatibility_check():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a real use case where production code would call this function? If it’s only used by tests, then it seems reasonable to rely implicitly on the global and move the function into the test code instead.

Copy link
Collaborator

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should issue a warning to the user if we can't fetch the driver version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants