Fixed Win11 install issues of compile flags and isfnite() crashing. #1977
+182
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Build Fixes for NVIDIA Apex on Windows 11 (CUDA 12.8 / MSVC 2022)
Installation Command
Make sure you run below commands in x64 Native Tools Command Prompt for VS 2022 (use search in the win11 to find it). Before install it, make sure your environment has the necessary dependencies like
Pytorchandninja.Trouble shooting
If you encounter trouble with
compiled_autograd.h(1134 / 1108 / 1181), based on the Pytorch issue #148317, you may need to navigate to\anaconda\envs\basic\lib\site-packages\torch\include\torch\csrc\dynamo\compiled_autograd.hto Line 1134, and change it from:to
Build Environment
E:\CUDA128APEX_CPP_EXT=1,APEX_CUDA_EXT=1NVCC Version Info
Summary of Changes
This patch addresses three primary categories of build failures encountered on Windows:
1.
setup.pyConfigurationChanges
Added
libraries=["cublas", "cublasLt"]andextra_compile_argswith-D_DISABLE_EXTENDED_ALIGNED_STORAGEto several CUDA extensions.Affected Extensions
mlp_cudafused_dense_cudafused_weight_gradient_mlp_cudaCode Diff
Reasoning
LNK2001)cublas.libandcublasLt.libwhen these headers are used. Explicit linking resolves unresolved external symbols forcublasGemmEx,cublasLtMatmul, etc.std::aligned_storageworks, causing compliance standard errors with older CUDA headers. The flag_DISABLE_EXTENDED_ALIGNED_STORAGErestores the necessary behavior for compilation to succeed.2. Source Code Fixes (
csrc/)A. Type Definition Fix (
uint)File:
csrc/mlp_cuda.cuChange: Replaced
uintwithunsigned int.Reasoning: The type alias
uintis standard in Linux system headers but is not defined by default in the MSVC (Windows) environment. Using the standard C++ typeunsigned intensures cross-platform compatibility.B. Device Function Compatibility (
isfinite)Files:
csrc/multi_tensor_scale_kernel.cucsrc/multi_tensor_axpby_kernel.cuChange: Replaced the
isfinite()check with a robust floating-point check usingfabsf. Affected variables includingr_in[ii],r_x[ii]andr_y[ii].Reasoning: On Windows NVCC,
isfiniteoften resolves to the host-only C++ standard library function (std::isfinite) rather than the device intrinsic, causing a "calling a host function from a device function" error. Replacing it withfabsf(which is correctly mapped to a device intrinsic) bypasses this restriction while maintaining logical correctness.