Re: [PATCH RFC v2 05/15] vfio/nvgrace-egm: Introduce module to manage EGM
From: Alex Williamson
Date: Wed Mar 04 2026 - 13:11:47 EST
On Mon, 23 Feb 2026 15:55:04 +0000
<ankita@xxxxxxxxxx> wrote:
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
>
> The Extended GPU Memory (EGM) feature that enables the GPU to access
> the system memory allocations within and across nodes through high
> bandwidth path on Grace Based systems. The GPU can utilize the
> system memory located on the same socket or from a different socket
> or even on a different node in a multi-node system [1].
>
> When the EGM mode is enabled through SBIOS, the host system memory is
> partitioned into 2 parts: One partition for the Host OS usage
> called Hypervisor region, and a second Hypervisor-Invisible (HI) region
> for the VM. Only the hypervisor region is part of the host EFI map
> and is thus visible to the host OS on bootup. Since the entire VM
> sysmem is eligible for EGM allocations within the VM, the HI partition
> is interchangeably called as EGM region in the series. This HI/EGM region
> range base SPA and size is exposed through the ACPI DSDT properties.
>
> Whilst the EGM region is accessible on the host, it is not added to
> the kernel. The HI region is assigned to a VM by mapping the QEMU VMA
> to the SPA using remap_pfn_range().
>
> The following figure shows the memory map in the virtualization
> environment.
>
> |---- Sysmem ----| |--- GPU mem ---| VM Memory
> | | | |
> |IPA <-> SPA map | |IPA <-> SPA map|
> | | | |
> |--- HI / EGM ---|-- Host Mem --| |--- GPU mem ---| Host Memory
>
> Introduce a new nvgrace-egm auxiliary driver module to manage and
> map the HI/EGM region in the Grace Blackwell systems. This binds to
> the auxiliary device created by the parent nvgrace-gpu (in-tree
> module for device assignment) / nvidia-vgpu-vfio (out-of-tree open
> source module for SRIOV vGPU) to manage the EGM region for the VM.
> Note that there is a unique EGM region per socket and the auxiliary
> device gets created for every region. The parent module fetches the
> EGM region information from the ACPI tables and populate to the data
> structures shared with the auxiliary nvgrace-egm module.
>
> nvgrace-egm module handles the following:
Or it will eventually, not in this commit.
> 1. Fetch the EGM memory properties (base HPA, length, proximity domain)
> from the parent device shared EGM region structure.
> 2. Create a char device that can be used as memory-backend-file by Qemu
> for the VM and implement file operations. The char device is /dev/egmX,
> where X is the PXM node ID of the EGM being mapped fetched in 1.
> 3. Zero the EGM memory on first device open().
> 4. Map the QEMU VMA to the EGM region using remap_pfn_range.
> 5. Cleaning up state and destroying the chardev on device unbind.
> 6. Handle presence of retired ECC pages on the EGM region.
>
> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
> ---
> MAINTAINERS | 6 ++++++
> drivers/vfio/pci/nvgrace-gpu/Kconfig | 12 ++++++++++++
> drivers/vfio/pci/nvgrace-gpu/Makefile | 3 +++
> drivers/vfio/pci/nvgrace-gpu/egm.c | 22 ++++++++++++++++++++++
> drivers/vfio/pci/nvgrace-gpu/main.c | 1 +
> 5 files changed, 44 insertions(+)
> create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5b3d86de9ec0..1fc551d7d667 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27384,6 +27384,12 @@ F: drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> F: drivers/vfio/pci/nvgrace-gpu/main.c
> F: include/linux/nvgrace-egm.h
>
> +VFIO NVIDIA GRACE EGM DRIVER
> +M: Ankit Agrawal <ankita@xxxxxxxxxx>
> +L: kvm@xxxxxxxxxxxxxxx
> +S: Supported
> +F: drivers/vfio/pci/nvgrace-gpu/egm.c
I'm not sure a separate MAINTAINERS entry is warranted here, these are
intertwined, even if constructed to allow this EGM driver to be used by
an out-of-tree driver. It's also an unclean split, with Makefile and
Kconfig dependencies under the nvgrace-gpu heading. It should probably
be self contained in a separate sub-dir to justify a new MAINTAINERS
entry.
> +
> VFIO PCI DEVICE SPECIFIC DRIVERS
> R: Jason Gunthorpe <jgg@xxxxxxxxxx>
> R: Yishai Hadas <yishaih@xxxxxxxxxx>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Kconfig b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> index a7f624b37e41..7989d8d1c377 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Kconfig
> +++ b/drivers/vfio/pci/nvgrace-gpu/Kconfig
> @@ -1,8 +1,20 @@
> # SPDX-License-Identifier: GPL-2.0-only
> +config NVGRACE_EGM
> + tristate "EGM driver for NVIDIA Grace Hopper and Blackwell Superchip"
> + depends on ARM64 || (COMPILE_TEST && 64BIT)
> + depends on NVGRACE_GPU_VFIO_PCI
> + help
> + Extended GPU Memory (EGM) support for the GPU in the NVIDIA Grace
> + based chips required to avail the CPU memory as additional
> + cross-node/cross-socket memory for GPU using KVM/qemu.
> +
> + If you don't know what to do here, say N.
> +
> config NVGRACE_GPU_VFIO_PCI
> tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip"
> depends on ARM64 || (COMPILE_TEST && 64BIT)
> select VFIO_PCI_CORE
> + select NVGRACE_EGM
This should be dropped, it creates a circular dependency where we
cannot actually unselect NVGRACE_EGM with NVGRACE_GPU_VFIO_PCI
selected.
> help
> VFIO support for the GPU in the NVIDIA Grace Hopper Superchip is
> required to assign the GPU device to userspace using KVM/qemu/etc.
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile b/drivers/vfio/pci/nvgrace-gpu/Makefile
> index e72cc6739ef8..d0d191be56b9 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Makefile
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -1,3 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
> nvgrace-gpu-vfio-pci-y := main.o egm_dev.o
> +
> +obj-$(CONFIG_NVGRACE_EGM) += nvgrace-egm.o
> +nvgrace-egm-y := egm.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-gpu/egm.c
> new file mode 100644
> index 000000000000..999808807019
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -0,0 +1,22 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
2026
> + */
> +
> +#include <linux/vfio_pci_core.h>
Premature?
> +
> +static int __init nvgrace_egm_init(void)
> +{
> + return 0;
> +}
> +
> +static void __exit nvgrace_egm_cleanup(void)
> +{
> +}
> +
> +module_init(nvgrace_egm_init);
> +module_exit(nvgrace_egm_cleanup);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Ankit Agrawal <ankita@xxxxxxxxxx>");
> +MODULE_DESCRIPTION("NVGRACE EGM - Module to support Extended GPU Memory on NVIDIA Grace Based systems");
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index b356e941340a..0bb427cca31f 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -1410,3 +1410,4 @@ MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Ankit Agrawal <ankita@xxxxxxxxxx>");
> MODULE_AUTHOR("Aniket Agashe <aniketa@xxxxxxxxxx>");
> MODULE_DESCRIPTION("VFIO NVGRACE GPU PF - User Level driver for NVIDIA devices with CPU coherently accessible device memory");
> +MODULE_SOFTDEP("pre: nvgrace-egm");
Premature and wrong if necessary. AIUI the aux device created should
generate uevents and modules loaded automatically via device tables.
Thanks,
Alex