Re: [PATCH 3/3] gpu: nova-core: fix wrong use of barriers in GSP code

From: Joel Fernandes

Date: Mon Apr 06 2026 - 14:19:20 EST

On 4/4/2026 9:02 AM, Gary Guo wrote:
> On Thu Apr 2, 2026 at 10:56 PM BST, Joel Fernandes wrote:
>> Hi Gary,
>>
>> On 4/2/2026 11:24 AM, Gary Guo wrote:
>>> From: Gary Guo <gary@xxxxxxxxxxx>
>>>
>>> Currently, in the GSP->CPU messaging path, the current code misses a read
>>> barrier before data read. The barrier after read is updated to a DMA
>>> barrier (with release ordering desired), instead of the existing (Rust)
>>> SeqCst SMP barrier; the location of barrier is also moved to the beginning
>>> of function, because the barrier is needed to synchronizing between data
>>> and ring-buffer pointer, the RMW operation does not internally need a
>>> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
>>>
>>> In the CPU->GSP messaging path, the current code misses a write barrier
>>> after data write and before updating the CPU write pointer. Barrier is not
>>> needed before data write due to control dependency, this fact is documented
>>> explicitly. This could be replaced with an acquire barrier if needed.
>>>
>>> Signed-off-by: Gary Guo <gary@xxxxxxxxxxx>
>>> ---
>>> drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
>>> drivers/gpu/nova-core/gsp/fw.rs | 12 ------------
>>> 2 files changed, 19 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> index 2224896ccc89..7e4315b13984 100644
>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> @@ -19,6 +19,12 @@
>>> prelude::*,
>>> sync::{
>>> aref::ARef,
>>> + barrier::{
>>> + dma_mb,
>>> + Read,
>>> + Release,
>>> + Write, //
>>> + },
>>> Mutex, //
>>> },
>>> time::Delta,
>>> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>> let tx = self.cpu_write_ptr() as usize;
>>> let rx = self.gsp_read_ptr() as usize;
>>>
>>> + // ORDERING: control dependency provides necessary LOAD->STORE ordering.
>>> + // `dma_mb(Acquire)` may be used here if we don't want to rely on control dependency.
>>
>> Just checking, does control dependency on CPU side really apply to ordering for
>> IO (what the device perceives?). IOW, the loads are stores might be ordered on
>> the CPU side, but the device might be seeing these operations out of order. If
>> that is the case, perhaps the control dependency comment is misleading.
>
> Given that CPU cannot speculate store, I don't see how dependency ordering can
> be broken even if the other side is a bus-mastering device.
>
> For this to be broken, the device would be able to see the dependency-ordered
> STORE to be propagated to it before it does it own STORE that propagates to the
> CPU to trigger the CPU STORE in the first place? That'll just break casuality.
>
> I'll say that control dependency is sufficient here. I'm not worried that the
> compiler will break this particular control dependency given that if
> `gsp_read_ptr == cpu_write_ptr` will imply command allocation failure and this
> condition cannot possibly be optimized out by the compiler.
Yep, ok, fair enough.

thanks,

--
Joel Fernandes