Skip to content

Build with CUDA (Linux & Windows)

No pre-built OpenCV distribution — not pip install opencv-python, not the official Windows release packages, not apt install libopencv-dev — includes the CUDA modules. Since OpenCV 4.0, all CUDA-accelerated implementations live in opencv_contrib. Building from source with WITH_CUDA=ON and OPENCV_EXTRA_MODULES_PATH pointing to opencv_contrib/modules is the only way to get CUDA support.

CMake FlagPurposeNotes
WITH_CUDA=ONEnables CUDA module compilationMaster switch — everything else depends on this
CUDA_ARCH_BINTarget GPU compute capability (e.g. 8.6)Set explicitly for fastest builds; Auto auto-detects but produces very large binaries
CUDA_ARCH_PTXGenerate PTX intermediate code (forward-compatible)Set to "" to skip or e.g. 8.6 to include PTX for JIT compilation on future GPUs
WITH_CUDNN=ONEnable cuDNN integrationRequired for DNN CUDA backend; cuDNN must be installed first
OPENCV_DNN_CUDA=ONCUDA-accelerate the DNN module specificallyOnly needs cuDNN, not the entire CUDA module suite
ENABLE_FAST_MATH=ONCPU-level fast math (compiler flag)Trades floating-point precision for speed
CUDA_FAST_MATH=ONGPU-level fast math (nvcc flag)Same trade-off but for GPU kernels
WITH_CUBLAS=ONEnable cuBLAS GPU BLAS routinesUsed by several CUDA modules for matrix ops
WITH_CUFFT=ONEnable cuFFTGPU-accelerated FFT; used by CUDA module
ENABLE_CUDA_FIRST_CLASS_LANGUAGE=ONModern CMake CUDA language supportRecommended for CUDA 11+ and CMake 3.18+; replaces legacy FindCUDA.cmake
BUILD_opencv_world=ONBundle all modules into one DLLOptional; simplifies deployment on Windows
BUILD_opencv_cudacodec=OFF/ONNVIDIA Video Codec SDK integrationNeeds the Video Codec SDK headers separately; OFF is safe default

Setting CUDA_ARCH_BIN to your GPU’s exact compute capability is the most impactful build decision. Setting CUDA_GENERATION=Auto lets CMake auto-detect the installed GPU but generates code for all supported architectures by default, resulting in very long compile times and a larger binary.

ArchitectureGenerationCompute CapabilityRepresentative GPUs
KeplerGTX 700 series3.5, 3.7GTX 780, GTX Titan
MaxwellGTX 900 series5.0, 5.2GTX 980, GTX 970
PascalGTX 10 series6.0, 6.1GTX 1080, GTX 1060
VoltaDatacenter7.0Tesla V100
TuringRTX 20 series7.5RTX 2080, GTX 1650
AmpereRTX 30 series8.0, 8.6RTX 3090 (8.0), RTX 3080/3070/3060 (8.6)
Ada LovelaceRTX 40 series8.9RTX 4090, RTX 4080, RTX 4070
HopperDatacenter9.0H100

Find your GPU’s compute capability:

Terminal window
# Linux: query with nvidia-smi
nvidia-smi --query-gpu=compute_cap --format=csv
# Or via Python after CUDA is installed
python3 -c "import subprocess; print(subprocess.check_output(['nvidia-smi', '--query-gpu=compute_cap', '--format=csv,noheader']).decode())"

1. Install system dependencies:

Terminal window
sudo apt-get update
sudo apt-get install -y build-essential cmake git pkg-config
sudo apt-get install -y \
libavcodec-dev libavformat-dev libswscale-dev \
libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev \
libxvidcore-dev libx264-dev
sudo apt-get install -y \
libjpeg-dev libpng-dev libtiff-dev libopenexr-dev
sudo apt-get install -y python3-dev python3-numpy
sudo apt-get install -y libtbb2 libtbb-dev libeigen3-dev
sudo apt-get install -y libv4l-dev v4l-utils
sudo apt-get install -y libgtk-3-dev

2. Install CUDA Toolkit (from NVIDIA — do NOT use apt’s default cuda package):

Terminal window
# Visit https://developer.nvidia.com/cuda-downloads and follow the installer for your Ubuntu version.
# After install, add CUDA to PATH:
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
# Verify
nvcc --version
nvidia-smi

3. Install cuDNN (optional but needed for OPENCV_DNN_CUDA=ON):

Download from developer.nvidia.com/cudnn-downloads (requires NVIDIA account). Select the tarball for your CUDA version, then:

Terminal window
# Example for cuDNN 9.x with CUDA 12.x
tar -xvf cudnn-linux-x86_64-9.x.x.x_cuda12-archive.tar.xz
sudo cp cudnn-*/include/cudnn*.h /usr/local/cuda/include/
sudo cp cudnn-*/lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
sudo ldconfig

4. Clone and configure:

Terminal window
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
git clone https://github.com/opencv/opencv_extra.git
mkdir opencv/build && cd opencv/build
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DOPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-DOPENCV_TEST_DATA_PATH=../../opencv_extra/testdata \
-DOPENCV_GENERATE_PKGCONFIG=ON \
-DWITH_CUDA=ON \
-DWITH_CUDNN=ON \
-DOPENCV_DNN_CUDA=ON \
-DCUDA_ARCH_BIN=8.6 \
-DCUDA_ARCH_PTX="" \
-DENABLE_FAST_MATH=ON \
-DCUDA_FAST_MATH=ON \
-DWITH_CUBLAS=ON \
-DWITH_CUFFT=ON \
-DWITH_TBB=ON \
-DWITH_GSTREAMER=ON \
-DWITH_V4L=ON \
-DBUILD_opencv_python3=ON \
-DPYTHON3_EXECUTABLE=$(which python3) \
-DPYTHON3_NUMPY_INCLUDE_DIRS=$(python3 -c "import numpy; print(numpy.get_include())") \
-DBUILD_TESTS=ON \
-DBUILD_PERF_TESTS=ON \
..
make -j$(nproc)
sudo make install
sudo ldconfig

If you only need CUDA for neural network inference and not the full CUDA processing module suite, significantly reduce build time by disabling the other CUDA modules:

Terminal window
cmake \
-DWITH_CUDA=ON \
-DWITH_CUDNN=ON \
-DOPENCV_DNN_CUDA=ON \
-DCUDA_ARCH_BIN=8.6 \
-DBUILD_opencv_cudaarithm=OFF \
-DBUILD_opencv_cudabgsegm=OFF \
-DBUILD_opencv_cudafeatures2d=OFF \
-DBUILD_opencv_cudafilters=OFF \
-DBUILD_opencv_cudaimgproc=OFF \
-DBUILD_opencv_cudalegacy=OFF \
-DBUILD_opencv_cudaobjdetect=OFF \
-DBUILD_opencv_cudaoptflow=OFF \
-DBUILD_opencv_cudastereo=OFF \
-DBUILD_opencv_cudawarping=OFF \
-DBUILD_opencv_cudacodec=OFF \
..

Check the CMake configure output. After cmake .. completes, look for:

NVIDIA CUDA: YES (ver X.X, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 86
cuDNN: YES (ver X.X.X)

If these lines say NO, CUDA was not found and the build will not include CUDA modules.

Verify after install:

Python
import cv2
# Should print OpenCV version
print(cv2.__version__)
# Should print the full build info — look for "CUDA: YES" in the NVIDIA section
print(cv2.getBuildInformation())
# Count available CUDA devices — must return > 0
print(cv2.cuda.getCudaEnabledDeviceCount())

Benchmark GPU vs CPU (GEMM):

Python
import cv2
import numpy as np
import time
npTmp = np.random.random((1024, 1024)).astype(np.float32)
npMat1 = np.stack([npTmp, npTmp], axis=2)
npMat2 = npMat1.copy()
cuMat1 = cv2.cuda_GpuMat()
cuMat2 = cv2.cuda_GpuMat()
cuMat1.upload(npMat1)
cuMat2.upload(npMat2)
start = time.time()
cv2.cuda.gemm(cuMat1, cuMat2, 1, None, 0, None, 1)
print(f"CUDA GPU: {time.time() - start:.4f}s")
start = time.time()
cv2.gemm(npMat1, npMat2, 1, None, 0, None, 1)
print(f"CPU: {time.time() - start:.4f}s")

Test DNN CUDA backend:

Python
import cv2
net = cv2.dnn.readNetFromONNX("your_model.onnx")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
# If the build was successful, inference will run on GPU