Re: [PATCH v12 14/22] gpu: nova-core: Hopper/Blackwell: add FSP falcon EMEM operations

From: Eliot Courtney

Date: Tue Jun 02 2026 - 07:42:26 EST


On Tue Jun 2, 2026 at 12:21 PM JST, John Hubbard wrote:
> Add external memory (EMEM) read/write operations to the GPU's FSP falcon
> engine. These operations use Falcon PIO (Programmed I/O) to communicate
> with the FSP through indirect memory access.
>
> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>
> ---


> +impl Falcon<Fsp> {
> + /// Writes `data` to FSP external memory at byte `offset`.
> + ///
> + /// `data` is interpreted as little-endian 32-bit words. Returns `EINVAL`
> + /// if `offset` or the `data` length is not 4-byte aligned.
> + #[expect(dead_code)]
> + fn write_emem(&mut self, bar: &Bar0, offset: u32, data: &[u8]) -> Result {
> + if offset % 4 != 0 || data.len() % 4 != 0 {
> + return Err(EINVAL);
> + }
> +
> + let mut emem = Emem::new(bar);
> + emem.begin_write(offset as usize)?;
> + for chunk in data.chunks_exact(4) {
> + emem.write_next(u32::from_le_bytes([chunk[0], chunk[1], chunk[2], chunk[3]]));
> + }
> +
> + Ok(())
> + }
> +
> + /// Reads FSP external memory at byte `offset` into `data`.
> + ///
> + /// `data` is stored as little-endian 32-bit words. Returns `EINVAL` if
> + /// `offset` or the `data` length is not 4-byte aligned.
> + #[expect(dead_code)]
> + fn read_emem(&mut self, bar: &Bar0, offset: u32, data: &mut [u8]) -> Result {
> + if offset % 4 != 0 || data.len() % 4 != 0 {
> + return Err(EINVAL);
> + }
> +
> + let mut emem = Emem::new(bar);
> + emem.begin_read(offset as usize)?;
> + for chunk in data.chunks_exact_mut(4) {
> + chunk.copy_from_slice(&emem.read_next().to_le_bytes());
> + }
> +
> + Ok(())
> + }
> +}

Both `write_emem` and `read_emem` are only ever called with `offset` as
zero. I checked openrm, and it looks like there aren't ever writes or
reads that don't start at zero. So we could simplify the code by
removing `offset` and starting from zero if we will never use a non-zero
offset (given we have auto-increment). This also lets us remove
`EMEM_MAX_SIZE` and some `Result`s.

> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 2cb1f02f35a4..da7a10c0346a 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -475,6 +475,21 @@ pub(crate) fn vga_workspace_addr(self) -> Option<u64> {
> pub(crate) NV_PFALCON_FBIF_CTL(u32) @ PFalconBase + 0x00000624 {
> 7:7 allow_phys_no_ctx => bool;
> }
> +
> + // Falcon EMEM PIO registers (used by FSP on Hopper/Blackwell).
> + // These provide the falcon external memory communication interface.
> + pub(crate) NV_PFALCON_FALCON_EMEM_CTL(u32) @ PFalconBase + 0x00000ac0 {
> + /// EMEM byte offset (must be 4-byte aligned).
> + 23:0 offset;
> + /// Auto-increment the offset after each write.
> + 24:24 auto_increment_write => bool;
> + /// Auto-increment the offset after each read.
> + 25:25 auto_increment_read => bool;
> + }
> +
> + pub(crate) NV_PFALCON_FALCON_EMEM_DATA(u32) @ PFalconBase + 0x00000ac4 {
> + 31:0 data => u32;
> + }
> }

In openrm, it looks like this register only has offset from 15:2 rather
than 23:0. Is the full 24 bit offset correct?

Either way, we could make the non-divisible-by-4 case unrepresentable by
making this offset 15:2 (or 23:2) rather than 23:0.