Re: [PATCH 03/13] gpu: nova-core: fsp: try to enforce exclusive access to FSP channel
From: John Hubbard
Date: Tue Jun 16 2026 - 23:13:20 EST
On 6/16/26 7:55 PM, Eliot Courtney wrote:
> On Tue Jun 16, 2026 at 2:16 AM JST, Gary Guo wrote:
>> On Mon Jun 15, 2026 at 3:40 PM BST, Eliot Courtney wrote:
...
>>> + || {
>>> + let qhead = bar.read(regs::NV_PFSP_QUEUE_HEAD::at(0)).address();
>>> + let qtail = bar.read(regs::NV_PFSP_QUEUE_TAIL::at(0)).address();
>>> + let mhead = bar.read(regs::NV_PFSP_MSGQ_HEAD::at(0)).val();
>>> + let mtail = bar.read(regs::NV_PFSP_MSGQ_TAIL::at(0)).val();
>>
>> How does this prevent race between kernel and GSP when initiating FSP
>> communcation?
>
> You are right that it does not prevent it in the general case.
>
> This is the logic that openrm uses and locally, much earlier, I observed
> sometimes I did actually need this for reprobe to work. I think the
> reason (but it's been a while) is that before we did not wait for GSP to
> halt on unload and probe failure, which could mean there's a leftover
> message from FSP that is not consumed if you reprobe quickly after a
> failure. Now that we wait for GSP to halt it's much harder for this kind
Can we bottom out on the root cause, before coding in a speculative fix?
Check with the RM folks internally, they will have debugging history
and lore too.
> of issue to occur. Actually, it might be impossible currently for this
> kind of race to happen without a failure on unload (e.g. timeout of
> waiting for GSP to reset).
>
> The reason I thought to add this is more to match what appears to be
> the protocol that this transport uses (even if it might not be sound
> generally). I am curious what others think, if it's worth keeping this -
It depends on what we learn about the real root cause.
thanks,
--
John Hubbard