Re: [PATCH v2] gpu: nova-core: gsp: fix undefined behavior in command queue code

From: Gary Guo

Date: Tue Mar 24 2026 - 11:30:00 EST


On Tue Mar 24, 2026 at 2:44 PM GMT, Alexandre Courbot wrote:
> On Tue Mar 24, 2026 at 1:44 AM JST, Gary Guo wrote:
>> On Mon Mar 23, 2026 at 5:40 AM GMT, Alexandre Courbot wrote:
>>> `driver_read_area` and `driver_write_area` are internal methods that
>>> return slices containing the area of the command queue buffer that the
>>> driver has exclusive read or write access, respectively.
>>>
>>> While their returned value is correct and safe to use, internally they
>>> temporarily create a reference to the whole command-buffer slice,
>>> including GSP-owned regions. These regions can change without notice,
>>> and thus creating a slice to them is undefined behavior.
>>>
>>> Fix this by replacing the slice logic with pointer arithmetic and
>>> creating slices to valid regions only. It adds unsafe code, but should
>>> be mostly replaced by `IoView` and `IoSlice` once they land.
>>>
>>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>>> Reported-by: Danilo Krummrich <dakr@xxxxxxxxxx>
>>> Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@xxxxxxxxxx/
>>> Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxx>
>>> ---
>>> I didn't apply Eliot's Reviewed-by because the code has changed
>>> drastically. The logic should remain identical though.
>>> ---
>>> Changes in v2:
>>> - Use `u32_as_usize` consistently.
>>> - Reduce the number of `unsafe` blocks by computing the end offset of
>>> the returned slices and creating them at the end, in one step.
>>> - Take advantage of the fact that both slices have the same start index
>>> regardless of the branch chosen.
>>> - Improve safety comments.
>>> - Link to v1: https://patch.msgid.link/20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@xxxxxxxxxx
>>
>> Here's the diff that fixes the issue using I/O projection
>> https://lore.kernel.org/rust-for-linux/20260323153807.1360705-1-gary@xxxxxxxxxx/
>
> Should we apply or drop this patch meanwhile? I/O projections are still
> undergoing review, but I'm fine with dropping it if Danilo thinks we can
> live a bit longer with that UB. It's not like the driver is actively
> doing anything useful yet anyway.

I want to avoid big changes back and forth. We could use raw pointer projection
today, which could be fairly easy to convert to I/O projection:

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 191b648e2ede..4cdbeed04294 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -23,6 +23,7 @@
},
new_mutex,
prelude::*,
+ ptr::project as ptr_project,
sync::{
aref::ARef,
Mutex, //
@@ -306,24 +307,25 @@ fn driver_write_area_size(&self) -> usize {
let tx = self.gsp_write_ptr() as usize;
let rx = self.cpu_read_ptr() as usize;

- // SAFETY:
- // - We will only access the driver-owned part of the shared memory.
- // - Per the safety statement of the function, no concurrent access will be performed.
- let gsp_mem = unsafe { &*self.0.as_ptr() };
- let data = &gsp_mem.gspq.msgq.data;
+ let data = ptr_project!(self.0.as_ptr(), .gspq.msgq.data);

// The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
// belongs to the driver for reading.
// PANIC:
// - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
// - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
- if rx <= tx {
+ let (first, second) = if rx <= tx {
// The area is contiguous.
- (&data[rx..tx], &[])
+ (ptr_project!(data, [rx..tx]), ptr_project!(data, [..0]))
} else {
// The area is discontiguous.
- (&data[rx..], &data[..tx])
- }
+ (ptr_project!(data, [rx..]), ptr_project!(data, [..tx]))
+ };
+
+ // SAFETY:
+ // - We will only access the driver-owned part of the shared memory.
+ // - Per the safety statement of the function, no concurrent access will be performed.
+ (unsafe { &*first }, unsafe { &*second })
}

/// Allocates a region on the command queue that is large enough to send a command of `size`