Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
From: sashiko-bot
Date: Mon Jun 01 2026 - 23:40:52 EST
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] DMA Use-After-Free of FMC boot parameters during initialization
- [High] Missing little-endian conversions in hardware payloads breaks big-endian hosts
- [Medium] Kernel stack overflow risk due to large struct passed by value
--
commit 42c5f0e39c2bb0bc87a3f8df25554d04f00a3e69
Author: John Hubbard <jhubbard@xxxxxxxxxx>
gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
Build and send the Chain of Trust message to FSP, bundling the
DMA-coherent boot parameters that FSP reads at boot time.
> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
> index 67cf83aba83c6..352ef7683cf2a 100644
> --- a/drivers/gpu/nova-core/fsp.rs
> +++ b/drivers/gpu/nova-core/fsp.rs
[ ... ]
> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
> Ok(Fsp { falcon, fsp_fw })
> }
>
> + /// Boots GSP FMC via FSP Chain of Trust.
> + ///
> + /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
> + /// to FSP, and waits for the response.
> + pub(crate) fn boot_fmc(
> + &mut self,
> + dev: &device::Device<device::Bound>,
> + bar: &Bar0,
> + fb_layout: &FbLayout,
> + args: &FmcBootArgs,
> + ) -> Result {
> + dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
> +
> + let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
> + let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
> +
> + // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
> + let frts_offset = if !args.resume {
> + let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
> +
> + frts_reserved_size
> + .align_up(Alignment::new::<SZ_2M>())
> + .ok_or(EINVAL)?
> + } else {
> + 0
> + };
> + let frts_size: u32 = if !args.resume {
> + fb_layout.frts.len().try_into()?
> + } else {
> + 0
> + };
> +
> + let msg = KBox::new(
> + FspMessage {
> + mctp_header: MctpHeader::single_packet(),
> + nvdm_header: NvdmHeader::new(NvdmType::Cot),
> + cot: NvdmPayloadCot {
[Severity: Medium]
Is it intentional to construct this ~868-byte FspMessage structure on the
stack before boxing it?
This also dereferences and copies the 816-byte FmcSignatures structure
(sigs: *self.fsp_fw.fmc_sigs). Doesn't this bypass the optimization in
FspFirmware::extract_fmc_signatures() which avoided building FmcSignatures on
the stack?
Could this contribute to kernel stack overflows? Might it be safer to
initialize it directly on the heap using KBox::init() or KBox::new_uninit()?
> + version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
> + size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
> + .map_err(|_| EINVAL)?,
> + gsp_fmc_sysmem_offset: fmc_addr,
> + frts_sysmem_offset: 0,
> + frts_sysmem_size: 0,
> + frts_vidmem_offset: frts_offset,
> + frts_vidmem_size: frts_size,
[Severity: High]
Will these native integer types be correctly interpreted by the little-endian
RISC-V FSP/GSP processors if the host architecture is big-endian?
Because the struct is laid out in big-endian byte order on big-endian hosts,
won't the little-endian hardware swap the bytes when reading, resulting in
incorrect sizes, versions, and DMA addresses?
Should explicit little-endian types or byte-swapping (like .to_le()) be used
here to avoid breaking driver initialization on big-endian hosts?
> + sigs: *self.fsp_fw.fmc_sigs,
> + gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
> + },
> + },
> + GFP_KERNEL,
> + )?;
> +
> + self.send_sync_fsp(dev, bar, &*msg)?;
> +
> + dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
> + Ok(())
> + }
[ ... ]
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 0c54e8bf4bb3a..558b37863f00a 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
> })
> }
> }
> +
> +#[repr(u32)]
> +pub(crate) enum GspDmaTarget {
> + #[expect(dead_code)]
> + LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
> + CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
> + NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
> +}
> +
> +type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
> +
> +impl GspAcrBootGspRmParams {
> + fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
> + #[allow(non_snake_case)]
> + let params = init!(Self {
> + target: target as u32,
> + gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
> + gspRmDescOffset: wpr_meta_addr,
> + bIsGspRmBoot: 1,
[Severity: High]
Similarly to NvdmPayloadCot above, will these native integer types be
correctly interpreted by little-endian hardware on big-endian hosts?
Does this struct require explicit little-endian types or byte-swapping to
avoid accessing invalid physical memory on big-endian architectures?
> + wprCarveoutOffset: 0,
> + wprCarveoutSize: 0,
> + __bindgen_padding_0: Default::default(),
> + });
> +
> + params
> + }
> +}
[ ... ]
> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
> index b25970dd4561d..f41f3fea15ff2 100644
> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
[ ... ]
> @@ -39,17 +42,27 @@ impl GspHal for Gh100 {
> /// the GSP boot internally - no manual GSP reset/boot is needed.
> fn boot<'a>(
> &self,
> - _gsp: &'a Gsp,
> + gsp: &'a Gsp,
> dev: &'a device::Device<device::Bound>,
> bar: &'a Bar0,
> chipset: Chipset,
> - _fb_layout: &FbLayout,
> - _wpr_meta: &Coherent<GspFwWprMeta>,
> + fb_layout: &FbLayout,
> + wpr_meta: &Coherent<GspFwWprMeta>,
> _gsp_falcon: &'a Falcon<GspEngine>,
> _sec2_falcon: &'a Falcon<Sec2>,
> ) -> Result<BootUnloadGuard<'a>> {
> let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
> - let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> + let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
> +
> + let args = FmcBootArgs::new(
> + dev,
> + chipset,
> + wpr_meta.dma_handle(),
> + gsp.libos.dma_handle(),
> + false,
> + )?;
> +
> + fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>
> Err(ENOTSUPP)
[Severity: High]
Does returning an error here immediately drop the args variable, thereby
freeing its DMA-coherent memory before the FSP finishes reading it?
Since the FSP is actively booting asynchronously, wouldn't this cause a
hardware use-after-free of the boot parameters, leading to IOMMU faults or
undefined behavior?
Note: I see this is fixed in a later commit 'gpu: nova-core: gsp: enable FSP
boot path' which changes the return to Ok(unload_guard) and properly handles
the required hardware delays.
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260602032111.224790-1-jhubbard@xxxxxxxxxx?part=19