Re: [PATCH v2] gpu: nova-core: gsp: fix undefined behavior in command queue code

From: Alexandre Courbot

Date: Sat Mar 28 2026 - 10:56:25 EST

On Sat Mar 28, 2026 at 10:09 PM JST, Gary Guo wrote:
> On Fri Mar 27, 2026 at 12:47 AM GMT, Alexandre Courbot wrote:
>> On Thu Mar 26, 2026 at 9:03 PM JST, Gary Guo wrote:
>>> On Thu Mar 26, 2026 at 4:51 AM GMT, Alexandre Courbot wrote:
>>>> On Thu Mar 26, 2026 at 1:30 PM JST, Alexandre Courbot wrote:
>>>>> On Wed Mar 25, 2026 at 12:15 AM JST, Gary Guo wrote:
>>>>>> On Tue Mar 24, 2026 at 2:44 PM GMT, Alexandre Courbot wrote:
>>>>>>> On Tue Mar 24, 2026 at 1:44 AM JST, Gary Guo wrote:
>>>>>>>> On Mon Mar 23, 2026 at 5:40 AM GMT, Alexandre Courbot wrote:
>>>>>>>>> `driver_read_area` and `driver_write_area` are internal methods that
>>>>>>>>> return slices containing the area of the command queue buffer that the
>>>>>>>>> driver has exclusive read or write access, respectively.
>>>>>>>>>
>>>>>>>>> While their returned value is correct and safe to use, internally they
>>>>>>>>> temporarily create a reference to the whole command-buffer slice,
>>>>>>>>> including GSP-owned regions. These regions can change without notice,
>>>>>>>>> and thus creating a slice to them is undefined behavior.
>>>>>>>>>
>>>>>>>>> Fix this by replacing the slice logic with pointer arithmetic and
>>>>>>>>> creating slices to valid regions only. It adds unsafe code, but should
>>>>>>>>> be mostly replaced by `IoView` and `IoSlice` once they land.
>>>>>>>>>
>>>>>>>>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>>>>>>>>> Reported-by: Danilo Krummrich <dakr@xxxxxxxxxx>
>>>>>>>>> Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@xxxxxxxxxx/
>>>>>>>>> Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>> I didn't apply Eliot's Reviewed-by because the code has changed
>>>>>>>>> drastically. The logic should remain identical though.
>>>>>>>>> ---
>>>>>>>>> Changes in v2:
>>>>>>>>> - Use `u32_as_usize` consistently.
>>>>>>>>> - Reduce the number of `unsafe` blocks by computing the end offset of
>>>>>>>>> the returned slices and creating them at the end, in one step.
>>>>>>>>> - Take advantage of the fact that both slices have the same start index
>>>>>>>>> regardless of the branch chosen.
>>>>>>>>> - Improve safety comments.
>>>>>>>>> - Link to v1: https://patch.msgid.link/20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@xxxxxxxxxx
>>>>>>>>
>>>>>>>> Here's the diff that fixes the issue using I/O projection
>>>>>>>> https://lore.kernel.org/rust-for-linux/20260323153807.1360705-1-gary@xxxxxxxxxx/
>>>>>>>
>>>>>>> Should we apply or drop this patch meanwhile? I/O projections are still
>>>>>>> undergoing review, but I'm fine with dropping it if Danilo thinks we can
>>>>>>> live a bit longer with that UB. It's not like the driver is actively
>>>>>>> doing anything useful yet anyway.
>>>>>>
>>>>>> I want to avoid big changes back and forth. We could use raw pointer projection
>>>>>> today, which could be fairly easy to convert to I/O projection:
>>>>>
>>>>> Thanks for the diff. I have adapted it to work on top of Danilo's
>>>>> suggestion to compute the end indices first as it works just as well and
>>>>> is cleaner. I have been running into a link error with this conversion
>>>>> applied though - let's discuss that on v3.
>>>>
>>>> Mmm, I guess this was because the optimizer could not prove that the
>>>> slices were within the bounds of the command queue as the expressions
>>>> passed to `ptr::project` were too complex with that version and this
>>>> makes the `ProjectIndex` check fail. I have better luck when doing
>>>> something closer to the diff you pasted.
>>>
>>> I'm considering switching the projectiong `[]` syntax to become panicking
>>> instead, given that the slicing use case quite often is indeed hard to prove
>>> (and also, we already have panicking comments).
>>>
>>> One option is to just change `[]` to do that, another option is adding a new
>>> `[]!` syntax to denote panicking projections. I'm more inclined to just the
>>> first one to keep consistency with Rust slicing syntax, but the second one is
>>> okay to me too.
>>>
>>> Thoughts?
>>
>> If the slice's validity is hard to prove, then the caller should
>> probably rework their code towards something simpler (like we did with
>> this patch). Allowing a potentially invalid slice to build is just
>> inserting a kernel panic mine, and as you might have noticed from LPC I
>> am not a huge fan of those. :)
>>
>> I think hammering the point about slice validity in the documentation
>> should be enough. We *want* build to fail if the slice can be invalid.
>
> Given the kernel test robot result showing build errors, I am going to add a
> panicking variant. For the use case here you don't really want to use fallible
> returns (panicking indexing + PANIC comments should be sufficient).
>
> I haven't decided on the syntax yet, I'll put this in the next RfL weekly
> meeting agenda to discuss.

Meanwhile it would be nice to patch that UB though. I'll try and repro
the bot's errors locally to see if we can make it work. (it will have to
land after -rc6 unfortunately).