Build Instructions ================== **Note:** The most up-to-date build instructions are embedded in a set of scripts bundled in the FBGEMM repo under `setup_env.bash `_. The currently available FBGEMM GenAI build variants are: * CUDA The general steps for building FBGEMM GenAI are as follows: #. Set up an isolated build environment. #. Set up the toolchain for either a CUDA build. #. Install PyTorch. #. Run the build script. .. _fbgemm-genai.build.setup.env: Set Up an Isolated Build Environment ------------------------------------ Follow the instructions to set up the Conda environment: #. :ref:`fbgemm-gpu.build.setup.env` #. :ref:`fbgemm-gpu.build.setup.cuda` #. :ref:`fbgemm-gpu.build.setup.tools.install` #. :ref:`fbgemm-gpu.build.setup.pytorch.install` Installing PyTorch for CUDA Builds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For CUDA builds, install PyTorch with matching CUDA version support: .. code:: sh # !! Run inside the Conda environment !! # For CUDA 12.9 with PyTorch nightly (recommended for latest features) pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu129 # For CUDA 12.8 with PyTorch stable pip install torch --index-url https://download.pytorch.org/whl/cu128 # Verify PyTorch installation python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')" Other Pre-Build Setup --------------------- As FBGEMM GenAI leverages the same build process as FBGEMM_GPU, please refer to :ref:`fbgemm-gpu.build.prepare` for additional pre-build setup information. .. _fbgemm-genai.build.prepare: Preparing the Build ~~~~~~~~~~~~~~~~~~~ Clone the repo along with its submodules, and install ``requirements_genai.txt``: .. code:: sh # !! Run inside the Conda environment !! # Select a version tag FBGEMM_VERSION=v1.5.0 # Clone the repo along with its submodules git clone --recursive -b ${FBGEMM_VERSION} https://github.com/pytorch/FBGEMM.git fbgemm_${FBGEMM_VERSION} # Install additional required packages for building and testing cd fbgemm_${FBGEMM_VERSION}/fbgemm_gpu pip install -r requirements_genai.txt Initialize Git Submodules ~~~~~~~~~~~~~~~~~~~~~~~~~ FBGEMM GenAI relies on several submodules, including CUTLASS for optimized CUDA kernels. If you didn't use ``--recursive`` when cloning, initialize the submodules: .. code:: sh # Sync and initialize all submodules including CUTLASS git submodule sync git submodule update --init --recursive # Verify CUTLASS is available ls external/cutlass/include Install NCCL for Distributed Support ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For distributed communication support, install NCCL via conda: .. code:: sh # !! Run inside the Conda environment !! conda install -c conda-forge nccl -y Set Wheel Build Variables ~~~~~~~~~~~~~~~~~~~~~~~~~ When building out the Python wheel, the package name, Python version tag, and Python platform name must first be properly set: .. code:: sh # Set the package name depending on the build variant export package_name=fbgemm_genai_{cuda} # Set the Python version tag. It should follow the convention `py`, # e.g. Python 3.14 --> py314 export python_tag=py314 # Determine the processor architecture export ARCH=$(uname -m) # Set the Python platform name for the Linux case export python_plat_name="manylinux_2_28_${ARCH}" # For the macOS (x86_64) case export python_plat_name="macosx_10_9_${ARCH}" # For the macOS (arm64) case export python_plat_name="macosx_11_0_${ARCH}" # For the Windows case export python_plat_name="win_${ARCH}" .. _fbgemm-genai.build.process.cuda: CUDA Build ---------- Building FBGEMM GenAI for CUDA requires both NVML and cuDNN to be installed and made available to the build through environment variables. The presence of a CUDA device, however, is not required for building the package. Similar to CPU-only builds, building with Clang + ``libstdc++`` can be enabled by appending ``--cxxprefix=$CONDA_PREFIX`` to the build command, presuming the toolchains have been properly installed. Environment Setup for CUDA Builds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Set up the necessary environment variables for a CUDA build: .. code:: sh # !! Run in fbgemm_gpu/ directory inside the Conda environment !! # Specify CUDA paths (adjust to your CUDA installation) export CUDA_HOME="/usr/local/cuda" export CUDACXX="${CUDA_HOME}/bin/nvcc" export PATH="${CUDA_HOME}/bin:${PATH}" export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}" # Specify NVML filepath (usually in CUDA stubs directory) export NVML_LIB_PATH="${CUDA_HOME}/lib64/stubs/libnvidia-ml.so" # Specify NCCL filepath (installed via conda) export NCCL_LIB_PATH="${CONDA_PREFIX}/lib/libnccl.so" CUDA Architecture Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Configure the target CUDA architectures for your hardware: .. code:: sh # Build for SM70/80 (V100/A100 GPU); update as needed # If not specified, only the CUDA architecture supported by current system will be targeted # If not specified and no CUDA device is present either, all CUDA architectures will be targeted cuda_arch_list=7.0;8.0 # For NVIDIA Blackwell architecture (GB100, GB200): # cuda_arch_list=10.0a # export TORCH_CUDA_ARCH_LIST="10.0a" # Unset TORCH_CUDA_ARCH_LIST if it exists, bc it takes precedence over # -DTORCH_CUDA_ARCH_LIST during the invocation of setup.py unset TORCH_CUDA_ARCH_LIST Optional NVCC Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additional NVCC configuration options: .. code:: sh # [OPTIONAL] Allow NVCC to use host compilers that are newer than what NVCC officially supports nvcc_prepend_flags=( -allow-unsupported-compiler ) # [OPTIONAL] If clang is the host compiler, set NVCC to use libstdc++ since libc++ is not supported nvcc_prepend_flags+=( -Xcompiler -stdlib=libstdc++ -ccbin "/path/to/clang++" ) # [OPTIONAL] Set NVCC_PREPEND_FLAGS as needed export NVCC_PREPEND_FLAGS="${nvcc_prepend_flags[@]}" # [OPTIONAL] Enable verbose NVCC logs export NVCC_VERBOSE=1 Building the Package ~~~~~~~~~~~~~~~~~~~~ .. code:: sh # !! Run in fbgemm_gpu/ directory inside the Conda environment !! # [OPTIONAL] Specify the CUDA installation paths # This may be required if CMake is unable to find nvcc export CUDACXX=/path/to/nvcc export CUDA_BIN_PATH=/path/to/cuda/installation # Build the wheel artifact only python setup.py bdist_wheel \ --build-target=genai \ --build-variant=cuda \ --python-tag="${python_tag}" \ --plat-name="${python_plat_name}" \ --nvml_lib_path=${NVML_LIB_PATH} \ --nccl_lib_path=${NCCL_LIB_PATH} \ -DTORCH_CUDA_ARCH_LIST="${cuda_arch_list}" # Build and install the library into the Conda environment python setup.py install \ --build-target=genai \ --build-variant=cuda \ --nvml_lib_path=${NVML_LIB_PATH} \ --nccl_lib_path=${NCCL_LIB_PATH} \ -DTORCH_CUDA_ARCH_LIST="${cuda_arch_list}" .. _fbgemm-gpu.build.process.rocm: ROCm Build ---------- For ROCm builds, ``ROCM_PATH`` and ``PYTORCH_ROCM_ARCH`` need to be specified. The presence of a ROCm device, however, is not required for building the package. Similar to CUDA builds, building with Clang + ``libstdc++`` can be enabled by appending ``--cxxprefix=$CONDA_PREFIX`` to the build command, presuming the toolchains have been properly installed. .. code:: sh # !! Run in fbgemm_gpu/ directory inside the Conda environment !! export ROCM_PATH=/path/to/rocm # [OPTIONAL] Enable verbose HIPCC logs export HIPCC_VERBOSE=1 # Build for the target architecture of the ROCm device installed on the machine (e.g. 'gfx908,gfx90a,gfx942') # See https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html for list export PYTORCH_ROCM_ARCH=$(${ROCM_PATH}/bin/rocminfo | grep -o -m 1 'gfx.*') # Build the wheel artifact only python setup.py bdist_wheel \ --build-target=genai \ --build-variant=rocm \ --python-tag="${python_tag}" \ --plat-name="${python_plat_name}" \ -DAMDGPU_TARGETS="${PYTORCH_ROCM_ARCH}" \ -DHIP_ROOT_DIR="${ROCM_PATH}" \ -DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \ -DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA" # Build and install the library into the Conda environment python setup.py install \ --build-target=genai \ --build-variant=rocm \ -DAMDGPU_TARGETS="${PYTORCH_ROCM_ARCH}" \ -DHIP_ROOT_DIR="${ROCM_PATH}" \ -DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \ -DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA" Post-Build Checks (For Developers) ---------------------------------- As FBGEMM GenAI leverages the same build process as FBGEMM_GPU, please refer to :ref:`fbgemm-gpu.build.process.post-build` for information on additional post-build checks. Troubleshooting Build Issues ----------------------------- Common Issues and Solutions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **CUTLASS not found**: Ensure git submodules are initialized: .. code:: sh git submodule sync git submodule update --init --recursive 2. **CUDA version mismatch**: Ensure PyTorch CUDA version matches your system CUDA: .. code:: sh # Check system CUDA version nvcc --version # Check PyTorch CUDA version python -c "import torch; print(torch.version.cuda)" 3. **NVML/NCCL library not found**: Verify the library paths are correct: .. code:: sh # Check NVML exists ls -la ${NVML_LIB_PATH} # Check NCCL exists ls -la ${NCCL_LIB_PATH}