Re: [PATCH v12 19/22] gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot

From: Alexandre Courbot

Date: Wed Jun 03 2026 - 01:24:44 EST


On Tue Jun 2, 2026 at 12:40 PM JST, sashiko-bot wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] DMA Use-After-Free of FMC boot parameters during initialization
> - [High] Missing little-endian conversions in hardware payloads breaks big-endian hosts
> - [Medium] Kernel stack overflow risk due to large struct passed by value
> --
>
> commit 42c5f0e39c2bb0bc87a3f8df25554d04f00a3e69
> Author: John Hubbard <jhubbard@xxxxxxxxxx>
>
> gpu: nova-core: Hopper/Blackwell: add FSP Chain of Trust boot
>
> Build and send the Chain of Trust message to FSP, bundling the
> DMA-coherent boot parameters that FSP reads at boot time.
>
>> diff --git a/drivers/gpu/nova-core/fsp.rs b/drivers/gpu/nova-core/fsp.rs
>> index 67cf83aba83c6..352ef7683cf2a 100644
>> --- a/drivers/gpu/nova-core/fsp.rs
>> +++ b/drivers/gpu/nova-core/fsp.rs
>
> [ ... ]
>
>> @@ -113,8 +190,65 @@ pub(crate) fn wait_secure_boot(
>> Ok(Fsp { falcon, fsp_fw })
>> }
>>
>> + /// Boots GSP FMC via FSP Chain of Trust.
>> + ///
>> + /// Builds the CoT message from the pre-configured [`FmcBootArgs`], sends it
>> + /// to FSP, and waits for the response.
>> + pub(crate) fn boot_fmc(
>> + &mut self,
>> + dev: &device::Device<device::Bound>,
>> + bar: &Bar0,
>> + fb_layout: &FbLayout,
>> + args: &FmcBootArgs,
>> + ) -> Result {
>> + dev_dbg!(dev, "Starting FSP boot sequence for {}\n", args.chipset);
>> +
>> + let fmc_addr = self.fsp_fw.fmc_image.dma_handle();
>> + let fmc_boot_params_addr = args.fmc_boot_params.dma_handle();
>> +
>> + // frts_offset is relative to FB end: FRTS_location = FB_END - frts_offset
>> + let frts_offset = if !args.resume {
>> + let frts_reserved_size = fb_layout.heap.len() + u64::from(fb_layout.pmu_reserved_size);
>> +
>> + frts_reserved_size
>> + .align_up(Alignment::new::<SZ_2M>())
>> + .ok_or(EINVAL)?
>> + } else {
>> + 0
>> + };
>> + let frts_size: u32 = if !args.resume {
>> + fb_layout.frts.len().try_into()?
>> + } else {
>> + 0
>> + };
>> +
>> + let msg = KBox::new(
>> + FspMessage {
>> + mctp_header: MctpHeader::single_packet(),
>> + nvdm_header: NvdmHeader::new(NvdmType::Cot),
>> + cot: NvdmPayloadCot {
> [Severity: Medium]
> Is it intentional to construct this ~868-byte FspMessage structure on the
> stack before boxing it?
>
> This also dereferences and copies the 816-byte FmcSignatures structure
> (sigs: *self.fsp_fw.fmc_sigs). Doesn't this bypass the optimization in
> FspFirmware::extract_fmc_signatures() which avoided building FmcSignatures on
> the stack?
>
> Could this contribute to kernel stack overflows? Might it be safer to
> initialize it directly on the heap using KBox::init() or KBox::new_uninit()?

Ah, I've just flagged the same thing in my comment. Very nice that Sashiko is
able to catch these.

>
>> + version: hal::fsp_hal(args.chipset).ok_or(ENOTSUPP)?.cot_version(),
>> + size: u16::try_from(core::mem::size_of::<NvdmPayloadCot>())
>> + .map_err(|_| EINVAL)?,
>> + gsp_fmc_sysmem_offset: fmc_addr,
>> + frts_sysmem_offset: 0,
>> + frts_sysmem_size: 0,
>> + frts_vidmem_offset: frts_offset,
>> + frts_vidmem_size: frts_size,
> [Severity: High]
> Will these native integer types be correctly interpreted by the little-endian
> RISC-V FSP/GSP processors if the host architecture is big-endian?
>
> Because the struct is laid out in big-endian byte order on big-endian hosts,
> won't the little-endian hardware swap the bytes when reading, resulting in
> incorrect sizes, versions, and DMA addresses?
>
> Should explicit little-endian types or byte-swapping (like .to_le()) be used
> here to avoid breaking driver initialization on big-endian hosts?

Non-issue as Nova is little-endian only.

>
>> + sigs: *self.fsp_fw.fmc_sigs,
>> + gsp_boot_args_sysmem_offset: fmc_boot_params_addr,
>> + },
>> + },
>> + GFP_KERNEL,
>> + )?;
>> +
>> + self.send_sync_fsp(dev, bar, &*msg)?;
>> +
>> + dev_dbg!(dev, "FSP Chain of Trust completed successfully\n");
>> + Ok(())
>> + }
>
> [ ... ]
>
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 0c54e8bf4bb3a..558b37863f00a 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -934,3 +934,67 @@ fn new(cmdq: &Cmdq) -> impl Init<Self> + '_ {
>> })
>> }
>> }
>> +
>> +#[repr(u32)]
>> +pub(crate) enum GspDmaTarget {
>> + #[expect(dead_code)]
>> + LocalFb = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_LOCAL_FB,
>> + CoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_COHERENT_SYSTEM,
>> + NoncoherentSystem = bindings::GSP_DMA_TARGET_GSP_DMA_TARGET_NONCOHERENT_SYSTEM,
>> +}
>> +
>> +type GspAcrBootGspRmParams = bindings::GSP_ACR_BOOT_GSP_RM_PARAMS;
>> +
>> +impl GspAcrBootGspRmParams {
>> + fn new(target: GspDmaTarget, wpr_meta_addr: u64) -> impl Init<Self> {
>> + #[allow(non_snake_case)]
>> + let params = init!(Self {
>> + target: target as u32,
>> + gspRmDescSize: num::usize_into_u32::<{ size_of::<GspFwWprMeta>() }>(),
>> + gspRmDescOffset: wpr_meta_addr,
>> + bIsGspRmBoot: 1,
> [Severity: High]
> Similarly to NvdmPayloadCot above, will these native integer types be
> correctly interpreted by little-endian hardware on big-endian hosts?
>
> Does this struct require explicit little-endian types or byte-swapping to
> avoid accessing invalid physical memory on big-endian architectures?

Ditto.

>
>> + wprCarveoutOffset: 0,
>> + wprCarveoutSize: 0,
>> + __bindgen_padding_0: Default::default(),
>> + });
>> +
>> + params
>> + }
>> +}
>
> [ ... ]
>
>> diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
>> index b25970dd4561d..f41f3fea15ff2 100644
>> --- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
>> +++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
>
> [ ... ]
>
>> @@ -39,17 +42,27 @@ impl GspHal for Gh100 {
>> /// the GSP boot internally - no manual GSP reset/boot is needed.
>> fn boot<'a>(
>> &self,
>> - _gsp: &'a Gsp,
>> + gsp: &'a Gsp,
>> dev: &'a device::Device<device::Bound>,
>> bar: &'a Bar0,
>> chipset: Chipset,
>> - _fb_layout: &FbLayout,
>> - _wpr_meta: &Coherent<GspFwWprMeta>,
>> + fb_layout: &FbLayout,
>> + wpr_meta: &Coherent<GspFwWprMeta>,
>> _gsp_falcon: &'a Falcon<GspEngine>,
>> _sec2_falcon: &'a Falcon<Sec2>,
>> ) -> Result<BootUnloadGuard<'a>> {
>> let fsp_fw = FspFirmware::new(dev, chipset, FIRMWARE_VERSION)?;
>> - let _fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
>> + let mut fsp = Fsp::wait_secure_boot(dev, bar, chipset, fsp_fw)?;
>> +
>> + let args = FmcBootArgs::new(
>> + dev,
>> + chipset,
>> + wpr_meta.dma_handle(),
>> + gsp.libos.dma_handle(),
>> + false,
>> + )?;
>> +
>> + fsp.boot_fmc(dev, bar, fb_layout, &args)?;
>>
>> Err(ENOTSUPP)
> [Severity: High]
> Does returning an error here immediately drop the args variable, thereby
> freeing its DMA-coherent memory before the FSP finishes reading it?
>
> Since the FSP is actively booting asynchronously, wouldn't this cause a
> hardware use-after-free of the boot parameters, leading to IOMMU faults or
> undefined behavior?
>
> Note: I see this is fixed in a later commit 'gpu: nova-core: gsp: enable FSP
> boot path' which changes the return to Ok(unload_guard) and properly handles
> the required hardware delays.

`boot_fmc` is synchronous, so there should not be a use-after-free here.