Re: [PATCH v10 26/28] gpu: nova-core: Blackwell: use correct sysmem flush registers

From: Alexandre Courbot

Date: Mon Apr 20 2026 - 11:50:21 EST


On Sat Apr 11, 2026 at 11:49 AM JST, John Hubbard wrote:
> Blackwell GPUs moved the sysmem flush page registers away from the
> legacy NV_PFB_NISO_FLUSH_SYSMEM_ADDR used by Ampere/Ada.
>
> GB10x uses HSHUB0 registers, with both a primary and EG (egress) pair
> that must be programmed to the same address. GB20x uses FBHUB0
> registers.
>
> Add separate GB100 and GB202 fb HALs, and split the Blackwell HAL
> dispatch so that each uses its respective registers.
>
> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>

This is another one that can be moved to the beginning of the series.
That way we have a nice progression of addressing the hardware
differences -> fork the boot path -> build and use FMC messaging.

> ---
> drivers/gpu/nova-core/fb/hal.rs | 8 ++-
> drivers/gpu/nova-core/fb/hal/gb100.rs | 54 ++++++++++++++++---
> drivers/gpu/nova-core/fb/hal/gb202.rs | 77 +++++++++++++++++++++++++++
> drivers/gpu/nova-core/regs.rs | 36 +++++++++++++
> 4 files changed, 168 insertions(+), 7 deletions(-)
> create mode 100644 drivers/gpu/nova-core/fb/hal/gb202.rs
>
> diff --git a/drivers/gpu/nova-core/fb/hal.rs b/drivers/gpu/nova-core/fb/hal.rs
> index 478f80d640c1..65edf07c3222 100644
> --- a/drivers/gpu/nova-core/fb/hal.rs
> +++ b/drivers/gpu/nova-core/fb/hal.rs
> @@ -13,9 +13,14 @@
> mod ga100;
> mod ga102;
> mod gb100;
> +mod gb202;
> mod gh100;
> mod tu102;
>
> +/// Non-WPR heap size for Blackwell (2 MiB + 128 KiB).
> +/// See Open RM: kgspCalculateFbLayout_GB100.
> +const BLACKWELL_NON_WPR_HEAP_SIZE: u32 = 0x220000;

What this const? If we want to share a value between different HALs the
way of doing is putting it into a `pub(super)` method of the lowest
HAL module and making higher ones call into it.

> +
> pub(crate) trait FbHal {
> /// Returns the address of the currently-registered sysmem flush page.
> fn read_sysmem_flush_page(&self, bar: &Bar0) -> u64;
> @@ -46,6 +51,7 @@ pub(crate) fn fb_hal(chipset: Chipset) -> &'static dyn FbHal {
> Architecture::Ampere if chipset == Chipset::GA100 => ga100::GA100_HAL,
> Architecture::Ampere | Architecture::Ada => ga102::GA102_HAL,
> Architecture::Hopper => gh100::GH100_HAL,
> - Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => gb100::GB100_HAL,
> + Architecture::BlackwellGB10x => gb100::GB100_HAL,
> + Architecture::BlackwellGB20x => gb202::GB202_HAL,
> }
> }
> diff --git a/drivers/gpu/nova-core/fb/hal/gb100.rs b/drivers/gpu/nova-core/fb/hal/gb100.rs
> index bead99a6ca76..c6e2a505e6ae 100644
> --- a/drivers/gpu/nova-core/fb/hal/gb100.rs
> +++ b/drivers/gpu/nova-core/fb/hal/gb100.rs
> @@ -1,21 +1,64 @@
> // SPDX-License-Identifier: GPL-2.0
>
> -use kernel::prelude::*;
> +//! Blackwell GB10x framebuffer HAL.

Let's add the module-level doccomments in the commit that introduced the
module in the first place.

> +//!
> +//! GB10x GPUs use HSHUB0 registers for the sysmem flush page. Both the primary and EG (egress)
> +//! register pairs must be programmed to the same address, as required by hardware.

This should be a method doccomment rather than a module one imho.

<snip>
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index c3ccae0c235f..77b11c7de3f8 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -145,6 +145,42 @@ fn fmt(&self, f: &mut kernel::fmt::Formatter<'_>) -> kernel::fmt::Result {
> /// Bits 12..40 of the higher (exclusive) bound of the WPR2 region.
> 31:4 hi_val;
> }
> +
> + // Blackwell GB10x sysmem flush registers (HSHUB0).
> + //
> + // GB10x GPUs use two pairs of HSHUB registers for sysmembar: a primary pair and an EG
> + // (egress) pair. Both must be programmed to the same address. Hardware ignores bits 7:0
> + // of each LO register. HSHUB0 base is 0x00891000.
> +
> + pub(crate) NV_PFB_HSHUB0_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ 0x00891e50 {
> + 31:0 adr => u32;
> + }
> +
> + pub(crate) NV_PFB_HSHUB0_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ 0x00891e54 {
> + 19:0 adr;
> + }
> +
> + pub(crate) NV_PFB_HSHUB0_EG_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ 0x008916c0 {
> + 31:0 adr => u32;
> + }
> +
> + pub(crate) NV_PFB_HSHUB0_EG_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ 0x008916c4 {
> + 19:0 adr;
> + }

I had trouble finding these registers in OpenRM because they are not
defined under the same name, given that they are relative registers and
there are multiple HSHUBs (a detail that will likely become significant
in the future).

So we should get this right from the get-go and use relative registers,
and a (for now) unique `Hshub0` base. That will also make it easier to
match the registers against OpenRM when needed.

... but then, when I looked for the `0x00891000` base in OpenRM, I also
couldn't find it. All I found was this in
`_kmemsysInitHshub0Aperture_GB100`:

#define NV_PFB_HSHUB0 0x00870fff:0x00870000

... which is different from the base in the comment above, for a reason
I ignore.

And it looks like even that value isn't absolutely correct, because
looking further in OpenRM is seems like it is getting the bases for the
HSHUBs using `NV_PTOP_DEVICE_INFO2` - runtime, per SKU values! We might
need to update registers to support this, and I think it is fine to go
with a workaround and a hardcoded value in a first time, but we should:

- Make sure we understand all the implications of this (I only looked
superficially),
- Add a big TODO item detailing what needs to be done if this cannot be
addressed in this series.

> +
> + // Blackwell GB20x sysmem flush registers (FBHUB0).
> + //
> + // Unlike the older NV_PFB_NISO_FLUSH_SYSMEM_ADDR registers which encode the address with an
> + // 8-bit right-shift, these registers take the raw address split into lower/upper 32-bit halves.
> + // The hardware ignores bits 7:0 of the LO register.
> +
> + pub(crate) NV_PFB_FBHUB0_PCIE_FLUSH_SYSMEM_ADDR_LO(u32) @ 0x008a1d58 {
> + 31:0 adr => u32;
> + }
> +
> + pub(crate) NV_PFB_FBHUB0_PCIE_FLUSH_SYSMEM_ADDR_HI(u32) @ 0x008a1d5c {
> + 19:0 adr;
> + }

Interestingly these registers are defined globally in OpenRM (and under
the same name), so these seem to be much less concerning than `HSHUB0`.
Still I suspect we will want to make them relative to future-proof the
code, if we can figure out what the base is.