HomeNewsMesa’s New “CLUDA” Driver Bridges Gallium3D and NVIDIA CUDA for OpenCL Compute

Mesa’s New “CLUDA” Driver Bridges Gallium3D and NVIDIA CUDA for OpenCL Compute

Red Hat and Rusticl developer Karol Herbst has opened a new Mesa merge request introducing “CLUDA,” a compute-only Gallium3D driver that runs on top of NVIDIA’s CUDA driver API. The proposal introduces a Gallium3D driver implemented over CUDA’s libcuda.so, enabling Mesa’s compute framework to operate on proprietary NVIDIA hardware.

Herbst describes CLUDA as a driver that “implements the Gallium API on top of the CUDA driver API.” The project, still experimental, currently targets compute workloads such as OpenCL and uses the CUDA driver library (libcuda.so) shipped with NVIDIA’s proprietary stack. No runtime CUDA packages are required beyond the driver, though build-time CUDA development headers are needed.

Learning CUDA the Hard Way

Herbst said the idea came after a hallway conversation at XDC (X.Org Developers’ Conference): “Somebody mentioned to me at XDC … that implementing OpenCL on top of CUDA in Mesa could help out with something.”

He began coding shortly after returning home with access to an NVIDIA GPU. The result is CLUDA—a blend of C and Rust, where the main driver is written in C and the PTX-generation logic uses Rust to simplify string handling. Mesa’s NIR intermediate representation is lowered to PTX, which then runs through CUDA’s built-in PTXJIT compiler.

What Works — and What Doesn’t Yet

CLUDA already supports general kernel launches, memory operations, and several OpenCL extensions missing from NVIDIA’s proprietary OpenCL driver. The supported list includes features like cl_khr_fp16, cl_khr_integer_dot_product, multiple subgroup extensions, and full SPIR-V support through cl_khr_il_program.

Herbst joked about the unusually long list: “Some of you might look at this list and go ‘wait a second … are those … really?’ and my answer is: ‘yes they are.’”

The driver is hard-coded for SM86 (Ampere / RTX 40 Series) hardware for now. Missing pieces include image support, cl_gl_sharing, double precision (FP64), and 64-bit atomics — features Herbst says could follow later “if motivation and time allow.”

Early Testing and Performance

Herbst’s internal Conformance Test Suite (CTS) run in “wimpy mode” passed 3,871 tests, failed 10, and crashed 4.
He noted minor precision issues with fp16 and denorm handling, and that timestamp queries currently rely on CUDA’s cuEventElapsedTime, which is not fully accurate.

Performance already approaches NVIDIA’s proprietary stack. On an RTX A6000, CLUDA achieved a LuxMark score of 57,702, compared with 64,009 under NVIDIA’s own OpenCL driver — roughly 90% of the native result. Herbst attributes the performance gap to NIR → PTX translation overhead and less-optimized generated code.

A Hack That Works

Herbst calls CLUDA an impromptu project: “I kept writing code and it kinda worked.” He released the merge request partly to gauge community interest: “Would be nice to know if people are interested at all … I don’t really have any concrete plans myself for this.” Herbst also noted that “CLUDA” is just a working name, saying he is open to changing it if a better suggestion comes along.

Sources

Mehedi Hasan
Mehedi Hasan
Mehedi Hasan is a dedicated Linux enthusiast with a passion for helping others understand the core concepts of Linux systems. He focuses on breaking down complex topics into simple, beginner-friendly explanations. His goal is to make Linux accessible without overwhelming new learners.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Hot of the Week

Mesa 24.0.9 Released with OpenGL 4.6 API, Vulkan 1.3 API, and More

Mesa3d, or Mesa 3D Graphics Library, is an open-source...

Linux 6.18 DRM Pull Bringing Tyr, Rocket, and Critical Intel/AMD Enhancements

Linus Torvalds has pulled the Direct Rendering Manager (DRM)...

FFmpeg Introduces Vulkan-Accelerated Apple ProRes Decoding

The FFmpeg developers have added Vulkan-accelerated video decoding for...

Intel Releases NPU Driver 1.24, Validated for Meteor, Arrow, and Lunar Lake Chips

Intel has released version 1.24 of its NPU Linux...

AMD’s ROCm 7.0.2 Released with Linux GPU and AI Support, Adds RDNA4 and RAG Capabilities

AMD has rolled out ROCm 7.0.2, strengthening its open-source...

> The Latest News