Re: [PATCH -next v8 2/3] rust: gpu: Add GPU buddy allocator bindings

From: Danilo Krummrich

Date: Tue Feb 10 2026 - 06:55:22 EST


On Mon Feb 9, 2026 at 10:42 PM CET, Joel Fernandes wrote:

[...]

> +//! params.size_bytes = SZ_8M as u64;

It looks there are ~30 occurences of `as u64` in this example code, which seems
quite inconvinient for drivers.

In nova-core I proposed to have FromSafeCast / IntoSafeCast for usize, u32 and
u64, which would help here as well, once factored out.

But even this seems pretty annoying. I wonder if we should just have separate
64-bit size constants, as they'd be pretty useful in other places as well, e.g.
GPUVM.

> +/// Inner structure holding the actual buddy allocator.
> +///
> +/// # Synchronization
> +///
> +/// The C `gpu_buddy` API requires synchronization (see `include/linux/gpu_buddy.h`).
> +/// The internal [`GpuBuddyGuard`] ensures that the lock is held for all
> +/// allocator and free operations, preventing races between concurrent allocations
> +/// and the freeing that occurs when [`AllocatedBlocks`] is dropped.
> +///
> +/// # Invariants
> +///
> +/// The inner [`Opaque`] contains a valid, initialized buddy allocator.
> +#[pin_data(PinnedDrop)]
> +struct GpuBuddyInner {
> + #[pin]
> + inner: Opaque<bindings::gpu_buddy>,
> + #[pin]
> + lock: Mutex<()>,

Why don't we have the mutex around the Opaque<bindings::gpu_buddy>? It's the
only field the mutex does protect.

Is it because mutex does not take an impl PinInit? If so, we should add a
comment with a proper TODO.

> + /// Base offset for all allocations (does not change after init).
> + base_offset: u64,
> + /// Cached chunk size (does not change after init).
> + chunk_size: u64,
> + /// Cached total size (does not change after init).
> + size: u64,
> +}
> +
> +impl GpuBuddyInner {
> + /// Create a pin-initializer for the buddy allocator.
> + fn new(params: &GpuBuddyParams) -> impl PinInit<Self, Error> {

I think we can just pass them by value, they shouldn't be needed anymore after
the GpuBuddy instance has been constructed.

> + let base_offset = params.base_offset_bytes;
> + let size = params.physical_memory_size_bytes;
> + let chunk_size = params.chunk_size_bytes;
> +
> + try_pin_init!(Self {
> + inner <- Opaque::try_ffi_init(|ptr| {
> + // SAFETY: ptr points to valid uninitialized memory from the pin-init
> + // infrastructure. gpu_buddy_init will initialize the structure.
> + to_result(unsafe { bindings::gpu_buddy_init(ptr, size, chunk_size) })
> + }),
> + lock <- new_mutex!(()),
> + base_offset: base_offset,
> + chunk_size: chunk_size,
> + size: size,
> + })
> + }

<snip>

> +/// GPU buddy allocator instance.
> +///
> +/// This structure wraps the C `gpu_buddy` allocator using reference counting.
> +/// The allocator is automatically cleaned up when all references are dropped.
> +///
> +/// # Invariants
> +///
> +/// The inner [`Arc`] points to a valid, initialized GPU buddy allocator.
> +pub struct GpuBuddy(Arc<GpuBuddyInner>);
> +
> +impl GpuBuddy {
> + /// Create a new buddy allocator.
> + ///
> + /// Creates a buddy allocator that manages a contiguous address space of the given
> + /// size, with the specified minimum allocation unit (chunk_size must be at least 4KB).
> + pub fn new(params: &GpuBuddyParams) -> Result<Self> {

Same here, we should be able to take this by value.

> + Ok(Self(Arc::pin_init(
> + GpuBuddyInner::new(params),
> + GFP_KERNEL,
> + )?))
> + }

<snip>

> + /// Allocate blocks from the buddy allocator.
> + ///
> + /// Returns an [`Arc<AllocatedBlocks>`] structure that owns the allocated blocks
> + /// and automatically frees them when all references are dropped.
> + ///
> + /// Takes `&self` instead of `&mut self` because the internal [`Mutex`] provides
> + /// synchronization - no external `&mut` exclusivity needed.
> + pub fn alloc_blocks(&self, params: &GpuBuddyAllocParams) -> Result<Arc<AllocatedBlocks>> {

Why do we force a reference count here? I think we should just return
impl PinInit<AllocatedBlocks, Error> and let the driver decide where to
initialize the object, no?

I.e. what if the driver wants to store additional data in a driver private
structure? Then we'd need two allocations otherwise and another reference count
in the worst case.

> + let buddy_arc = Arc::clone(&self.0);
> +
> + // Create pin-initializer that initializes list and allocates blocks.
> + let init = try_pin_init!(AllocatedBlocks {
> + buddy: Arc::clone(&buddy_arc),
> + list <- CListHead::new(),
> + flags: params.buddy_flags,
> + _: {
> + // Lock while allocating to serialize with concurrent frees.
> + let guard = buddy.lock();
> +
> + // SAFETY: `guard` provides exclusive access to the buddy allocator.
> + to_result(unsafe {
> + bindings::gpu_buddy_alloc_blocks(
> + guard.as_raw(),
> + params.start_range_address,
> + params.end_range_address,
> + params.size_bytes,
> + params.min_block_size_bytes,
> + list.as_raw(),
> + params.buddy_flags.as_raw(),
> + )
> + })?
> + }
> + });
> +
> + Arc::pin_init(init, GFP_KERNEL)
> + }
> +}