Red Hat and Rusticl developer Karol Herbst has opened a new Mesa merge request introducing “CLUDA,” a compute-only Gallium3D driver that runs on top of NVIDIA’s CUDA driver API. The proposal introduces a Gallium3D driver implemented over CUDA’s libcuda.so, enabling Mesa’s compute framework to operate on proprietary NVIDIA hardware.
Herbst describes CLUDA as a driver that “implements the Gallium API on top of the CUDA driver API.” The project, still experimental, currently targets compute workloads such as OpenCL and uses the CUDA driver library (libcuda.so) shipped with NVIDIA’s proprietary stack. No runtime CUDA packages are required beyond the driver, though build-time CUDA development headers are needed.
Learning CUDA the Hard Way
Herbst said the idea came after a hallway conversation at XDC (X.Org Developers’ Conference): “Somebody mentioned to me at XDC … that implementing OpenCL on top of CUDA in Mesa could help out with something.”
He began coding shortly after returning home with access to an NVIDIA GPU. The result is CLUDA—a blend of C and Rust, where the main driver is written in C and the PTX-generation logic uses Rust to simplify string handling. Mesa’s NIR intermediate representation is lowered to PTX, which then runs through CUDA’s built-in PTXJIT compiler.
What Works — and What Doesn’t Yet
CLUDA already supports general kernel launches, memory operations, and several OpenCL extensions missing from NVIDIA’s proprietary OpenCL driver. The supported list includes features like cl_khr_fp16, cl_khr_integer_dot_product, multiple subgroup extensions, and full SPIR-V support through cl_khr_il_program.
Herbst joked about the unusually long list: “Some of you might look at this list and go ‘wait a second … are those … really?’ and my answer is: ‘yes they are.’”
The driver is hard-coded for SM86 (Ampere / RTX 40 Series) hardware for now. Missing pieces include image support, cl_gl_sharing, double precision (FP64), and 64-bit atomics — features Herbst says could follow later “if motivation and time allow.”
Early Testing and Performance
Herbst’s internal Conformance Test Suite (CTS) run in “wimpy mode” passed 3,871 tests, failed 10, and crashed 4.
He noted minor precision issues with fp16 and denorm handling, and that timestamp queries currently rely on CUDA’s cuEventElapsedTime, which is not fully accurate.
Performance already approaches NVIDIA’s proprietary stack. On an RTX A6000, CLUDA achieved a LuxMark score of 57,702, compared with 64,009 under NVIDIA’s own OpenCL driver — roughly 90% of the native result. Herbst attributes the performance gap to NIR → PTX translation overhead and less-optimized generated code.
A Hack That Works
Herbst calls CLUDA an impromptu project: “I kept writing code and it kinda worked.” He released the merge request partly to gauge community interest: “Would be nice to know if people are interested at all … I don’t really have any concrete plans myself for this.” Herbst also noted that “CLUDA” is just a working name, saying he is open to changing it if a better suggestion comes along.
Sources


