Re: [PATCH v12 15/22] gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure

From: Alexandre Courbot

Date: Tue Jun 02 2026 - 21:20:11 EST


On Tue Jun 2, 2026 at 12:33 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] Race condition due to incorrect order of writing queue pointers in `send_msg`.
> - [High] Silent failure when sending a 4-byte message.
> - [High] Unchecked integer addition in `poll_msgq` can cause a kernel panic on untrusted MMIO reads.
> --
>
> commit 8a59b49984011320aa631b20a05550242e2baef2
> Author: John Hubbard <jhubbard@xxxxxxxxxx>
>
> gpu: nova-core: Hopper/Blackwell: add FSP message infrastructure
>
> This commit adds the foundation security processor (FSP) message
> infrastructure for the nova-core GPU driver. It implements queue
> registers and low-level helpers to communicate with the FSP falcon's EMEM.
>
>> diff --git a/drivers/gpu/nova-core/falcon/fsp.rs b/drivers/gpu/nova-core/falcon/fsp.rs
>> index 6b057d958115a..0ec1c55213bc8 100644
>> --- a/drivers/gpu/nova-core/falcon/fsp.rs
>> +++ b/drivers/gpu/nova-core/falcon/fsp.rs
> [ ... ]
>> @@ -145,4 +143,63 @@ fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
>>
>> Ok(())
>> }
>> +
>> + /// Poll FSP for incoming data.
>> + ///
>> + /// Returns the size of available data in bytes, or 0 if no data is available.
>> + ///
>> + /// The FSP message queue is not circular. Pointers are reset to 0 after each
>> + /// message exchange, so `tail >= head` is always true when data is present.
>> + #[expect(dead_code)]
>> + pub(crate) fn poll_msgq(&self, bar: &Bar0) -> u32 {
>> + let head = bar.read(regs::NV_PFSP_MSGQ_HEAD).address();
>> + let tail = bar.read(regs::NV_PFSP_MSGQ_TAIL).address();
>> +
>> + if head == tail {
>> + return 0;
>> + }
>> +
>> + // TAIL points at last DWORD written, so add 4 to get total size
>> + tail.saturating_sub(head) + 4
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Can this unchecked addition panic if we encounter a PCIe link failure?
>
> If `head` reads normally as 0 but a surprise removal occurs before reading
> `tail`, `tail` could read as 0xFFFFFFFF. The saturating subtraction would
> leave 0xFFFFFFFF, and the subsequent `+ 4` will overflow, causing a panic in
> debug builds where CONFIG_RUST_OVERFLOW_CHECKS=y is set.
>
> Should this use `saturating_add(4)` or another form of checked math to handle
> untrusted hardware inputs safely?

This is cheap to fix, and indeed probably a good idea.

>
>> + }
>> +
>> + /// Writes `packet` to FSP EMEM and updates the queue pointers to notify FSP.
>> + ///
>> + /// Returns `EINVAL` if `packet` is empty or its length is not 4-byte aligned.
>> + #[expect(dead_code)]
>> + pub(crate) fn send_msg(&mut self, bar: &Bar0, packet: &[u8]) -> Result {
>> + if packet.is_empty() {
>> + return Err(EINVAL);
>> + }
>> +
>> + // Write message to EMEM at offset 0 (validates 4-byte alignment)
>> + self.write_emem(bar, 0, packet)?;
>> +
>> + // Update queue pointers. TAIL points at the last DWORD written.
>> + let tail_offset = u32::try_from(packet.len() - 4).map_err(|_| EINVAL)?;
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Does this create a silent failure if a 4-byte message is sent?
>
> If `packet.len()` is 4, `tail_offset` is calculated as 0. Both NV_PFSP_QUEUE_HEAD
> and NV_PFSP_QUEUE_TAIL will then be written as 0. Because the FSP protocol treats
> `HEAD == TAIL` as an empty queue, the FSP will ignore the message while the
> function still returns `Ok(())`.
>
> Should this enforce a minimum packet length of 8 bytes or return an error
> for 4-byte messages?

This one looks bogus (looks like the claim that `HEAD == TAIL` means
empty queue is hallucinated?).

>
>> + bar.write_reg(regs::NV_PFSP_QUEUE_TAIL::zeroed().with_address(tail_offset));
>> + bar.write_reg(regs::NV_PFSP_QUEUE_HEAD::zeroed().with_address(0));
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> [Severity: High]
> Is there a race condition with the FSP hardware due to the order of these
> writes?
>
> By writing NV_PFSP_QUEUE_TAIL before NV_PFSP_QUEUE_HEAD, if QUEUE_HEAD was
> non-zero from a prior transaction, we momentarily create a state where
> `TAIL != HEAD`. The FSP may wake up immediately and read from the old, incorrect
> QUEUE_HEAD offset before the driver can write 0 to QUEUE_HEAD.
>
> Should NV_PFSP_QUEUE_HEAD be initialized to 0 before writing the
> NV_PFSP_QUEUE_TAIL doorbell?

Here IIUC QUEUE_HEAD is always `0`, so this is a non-issue.