Re: [PATCH v12 22/22] gpu: nova-core: gsp: enable FSP boot path
From: sashiko-bot
Date: Mon Jun 01 2026 - 23:39:07 EST
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [Critical] DMA Use-After-Free during GSP boot failure path due to reverse drop order.
- [High] Unconditional boot failure due to invalid error check in `wait_for_gsp_lockdown_release`.
--
commit c4510072c2995001dbf13cc4a2db6eb978b85b10
Author: Alexandre Courbot <acourbot@xxxxxxxxxx>
gpu: nova-core: gsp: enable FSP boot path
This patch enables the FSP boot path so Hopper and Blackwell devices can
boot. It accomplishes this by removing the hardcoded ENOTSUPP return code
and returning the boot unload guard instead.
> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index d372ae85c5bc7..93d8a72fe9303 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> @@ -166,7 +166,7 @@ fn boot<'a>(
> );
>
> // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
> - let _unload_guard =
> + let unload_guard =
> BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, Some(unload_bundle));
[Severity: Critical]
This is a pre-existing issue, but does this declaration order cause a DMA
use-after-free if the boot path fails?
In Rust, local variables are dropped in the reverse order of their declaration.
Here in Gh100::boot(), unload_guard is declared before args (which contains
FmcBootArgs and its DMA-coherent allocations).
If wait_for_gsp_lockdown_release() returns an error, args is dropped first,
freeing and unmapping the DMA memory while the GSP RISC-V is still running.
unload_guard is dropped afterward to execute teardown.
Since FspUnloadBundle::run() only polls for the GSP to halt without issuing an
explicit hardware reset if it is stuck:
drivers/gpu/nova-core/gsp/hal/gh100.rs:FspUnloadBundle::run() {
read_poll_timeout(
|| Ok(gsp_falcon.is_riscv_active(bar)),
|&active| !active,
Delta::from_millis(10),
Delta::from_secs(5),
)
...
}
Could this leave the GSP running indefinitely and capable of reading or writing
to the freed DMA memory?
>
> let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> @@ -184,7 +184,7 @@ fn boot<'a>(
> let fmc_boot_params_addr = args.boot_params_dma_handle();
> wait_for_gsp_lockdown_release(dev, bar, gsp_falcon, fmc_boot_params_addr)?;
[Severity: High]
This isn't a bug introduced by this patch, but will this function unconditionally
return an error for a successful boot sequence?
In wait_for_gsp_lockdown_release(), read_poll_timeout() waits for the lockdown
to be released. The lockdown_released() function correctly handles the valid state
where mbox.mbox0 contains the lower 32 bits of fmc_boot_params_addr:
drivers/gpu/nova-core/gsp/hal/gh100.rs:lockdown_released() {
if self.mbox0 != 0 && self.combined_addr() != fmc_boot_params_addr {
return true;
}
...
}
However, wait_for_gsp_lockdown_release() blindly evaluates mbox0 != 0 after
polling:
drivers/gpu/nova-core/gsp/hal/gh100.rs:wait_for_gsp_lockdown_release() {
...
if mbox.mbox0 != 0 {
dev_err!(dev, "GSP-FMC boot failed (mbox: {:#x})\n", mbox.mbox0);
return Err(EIO);
}
...
}
Since fmc_boot_params_addr is a DMA address, won't its lower 32 bits almost
certainly be non-zero, causing this newly enabled boot path to immediately fail?
>
> - Err(ENOTSUPP)
> + Ok(unload_guard)
> }
> }
>
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@xxxxxxxxxx?part=22