RE: [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for EGM

From: Shameer Kolothum Thodi

Date: Thu Feb 26 2026 - 09:37:04 EST

> -----Original Message-----
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
> Sent: 23 February 2026 15:55
> To: Ankit Agrawal <ankita@xxxxxxxxxx>; Vikram Sethi <vsethi@xxxxxxxxxx>;
> Jason Gunthorpe <jgg@xxxxxxxxxx>; Matt Ochs <mochs@xxxxxxxxxx>;
> jgg@xxxxxxxx; Shameer Kolothum Thodi <skolothumtho@xxxxxxxxxx>;
> alex@xxxxxxxxxxx
> Cc: Neo Jia <cjia@xxxxxxxxxx>; Zhi Wang <zhiw@xxxxxxxxxx>; Krishnakant
> Jaju <kjaju@xxxxxxxxxx>; Yishai Hadas <yishaih@xxxxxxxxxx>;
> kevin.tian@xxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: [PATCH RFC v2 02/15] vfio/nvgrace-gpu: Create auxiliary device for
> EGM
>
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
>
> The Extended GPU Memory (EGM) feature enables the GPU access to
> the system memory across sockets and physical systems on the
> Grace Hopper and Grace Blackwell systems. When the feature is
> enabled through SBIOS, part of the system memory is made available
> to the GPU for access through EGM path.
>
> The EGM functionality is separate and largely independent from the
> core GPU device functionality. However, the EGM region information
> of base SPA and size is associated with the GPU on the ACPI tables.
> An architecture wih EGM represented as an auxiliary device suits well
> in this context.
>
> The parent GPU device creates an EGM auxiliary device to be managed
> independently by an auxiliary EGM driver. The EGM region information
> is kept as part of the shared struct nvgrace_egm_dev along with the
> auxiliary device handle.
>
> Each socket has a separate EGM region and hence a multi-socket system
> have multiple EGM regions. Each EGM region has a separate nvgrace_egm_dev
> and the nvgrace-gpu keeps the EGM regions as part of a list.
>
> Note that EGM is an optional feature enabled through SBIOS. The EGM
> properties are only populated in ACPI tables if the feature is enabled;
> they are absent otherwise. The absence of the properties is thus not
> considered fatal. The presence of improper set of values however are
> considered fatal.
>
> It is also noteworthy that there may also be multiple GPUs present per
> socket and have duplicate EGM region information with them. Make sure
> the duplicate data does not get added.
>
> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
> ---
> MAINTAINERS | 5 +-
> drivers/vfio/pci/nvgrace-gpu/Makefile | 2 +-
> drivers/vfio/pci/nvgrace-gpu/egm_dev.c | 61 +++++++++++++++++++++
> drivers/vfio/pci/nvgrace-gpu/egm_dev.h | 17 ++++++
> drivers/vfio/pci/nvgrace-gpu/main.c | 76 +++++++++++++++++++++++++-
> include/linux/nvgrace-egm.h | 23 ++++++++
> 6 files changed, 181 insertions(+), 3 deletions(-)
> create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> create mode 100644 drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> create mode 100644 include/linux/nvgrace-egm.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 765ad2daa218..5b3d86de9ec0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -27379,7 +27379,10 @@ VFIO NVIDIA GRACE GPU DRIVER
> M: Ankit Agrawal <ankita@xxxxxxxxxx>
> L: kvm@xxxxxxxxxxxxxxx
> S: Supported
> -F: drivers/vfio/pci/nvgrace-gpu/
> +F: drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> +F: drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> +F: drivers/vfio/pci/nvgrace-gpu/main.c
> +F: include/linux/nvgrace-egm.h
>
> VFIO PCI DEVICE SPECIFIC DRIVERS
> R: Jason Gunthorpe <jgg@xxxxxxxxxx>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/Makefile b/drivers/vfio/pci/nvgrace-
> gpu/Makefile
> index 3ca8c187897a..e72cc6739ef8 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/Makefile
> +++ b/drivers/vfio/pci/nvgrace-gpu/Makefile
> @@ -1,3 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_NVGRACE_GPU_VFIO_PCI) += nvgrace-gpu-vfio-pci.o
> -nvgrace-gpu-vfio-pci-y := main.o
> +nvgrace-gpu-vfio-pci-y := main.o egm_dev.o
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> b/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> new file mode 100644
> index 000000000000..faf658723f7a
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm_dev.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved
> + */
> +
> +#include <linux/vfio_pci_core.h>
> +#include "egm_dev.h"
> +
> +/*
> + * Determine if the EGM feature is enabled. If disabled, there
> + * will be no EGM properties populated in the ACPI tables and this
> + * fetch would fail.
> + */
> +int nvgrace_gpu_has_egm_property(struct pci_dev *pdev, u64 *pegmpxm)
> +{
> + return device_property_read_u64(&pdev->dev, "nvidia,egm-pxm",
> + pegmpxm);
> +}
> +
> +static void nvgrace_gpu_release_aux_device(struct device *device)
> +{
> + struct auxiliary_device *aux_dev = container_of(device, struct
> auxiliary_device, dev);
> + struct nvgrace_egm_dev *egm_dev = container_of(aux_dev, struct
> nvgrace_egm_dev, aux_dev);
> +
> + kvfree(egm_dev);
> +}
> +
> +struct nvgrace_egm_dev *
> +nvgrace_gpu_create_aux_device(struct pci_dev *pdev, const char *name,
> + u64 egmpxm)
> +{
> + struct nvgrace_egm_dev *egm_dev;
> + int ret;
> +
> + egm_dev = kzalloc(sizeof(*egm_dev), GFP_KERNEL);
> + if (!egm_dev)
> + goto create_err;
> +
> + egm_dev->egmpxm = egmpxm;
> + egm_dev->aux_dev.id = egmpxm;
> + egm_dev->aux_dev.name = name;
> + egm_dev->aux_dev.dev.release = nvgrace_gpu_release_aux_device;
> + egm_dev->aux_dev.dev.parent = &pdev->dev;
> +
> + ret = auxiliary_device_init(&egm_dev->aux_dev);
> + if (ret)
> + goto free_dev;
> +
> + ret = auxiliary_device_add(&egm_dev->aux_dev);
> + if (ret) {
> + auxiliary_device_uninit(&egm_dev->aux_dev);
> + goto free_dev;
> + }
> +
> + return egm_dev;
> +
> +free_dev:
> + kvfree(egm_dev);
> +create_err:
> + return NULL;
> +}
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> b/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> new file mode 100644
> index 000000000000..c00f5288f4e7
> --- /dev/null
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm_dev.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights
> reserved
> + */
> +
> +#ifndef EGM_DEV_H
> +#define EGM_DEV_H
> +
> +#include <linux/nvgrace-egm.h>
> +
> +int nvgrace_gpu_has_egm_property(struct pci_dev *pdev, u64 *pegmpxm);
> +
> +struct nvgrace_egm_dev *
> +nvgrace_gpu_create_aux_device(struct pci_dev *pdev, const char *name,
> + u64 egmphys);
> +
> +#endif /* EGM_DEV_H */
> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-
> gpu/main.c
> index 7c4d51f5c701..23028e6e7192 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> @@ -10,6 +10,8 @@
> #include <linux/pci-p2pdma.h>
> #include <linux/pm_runtime.h>
> #include <linux/memory-failure.h>
> +#include <linux/nvgrace-egm.h>
> +#include "egm_dev.h"
>
> /*
> * The device memory usable to the workloads running in the VM is cached
> @@ -66,6 +68,68 @@ struct nvgrace_gpu_pci_core_device {
> bool reset_done;
> };
>
> +/*
> + * Track egm device lists. Note that there is one device per socket.
> + * All the GPUs belonging to the same sockets are associated with
> + * the EGM device for that socket.
> + */
> +static struct list_head egm_dev_list;

Probably I asked this before...Does this need any locking?

> +
> +static int nvgrace_gpu_create_egm_aux_device(struct pci_dev *pdev)
> +{
> + struct nvgrace_egm_dev_entry *egm_entry;
> + u64 egmpxm;
> + int ret = 0;
> +
> + /*
> + * EGM is an optional feature enabled in SBIOS. If disabled, there
> + * will be no EGM properties populated in the ACPI tables and this
> + * fetch would fail. Treat this failure as non-fatal and return
> + * early.
> + */
> + if (nvgrace_gpu_has_egm_property(pdev, &egmpxm))
> + goto exit;
> +
> + egm_entry = kzalloc(sizeof(*egm_entry), GFP_KERNEL);
> + if (!egm_entry)
> + return -ENOMEM;
> +
> + egm_entry->egm_dev =
> + nvgrace_gpu_create_aux_device(pdev,
> NVGRACE_EGM_DEV_NAME,
> + egmpxm);
> + if (!egm_entry->egm_dev) {
> + kvfree(egm_entry);
> + ret = -EINVAL;
> + goto exit;
> + }
> +
> + list_add_tail(&egm_entry->list, &egm_dev_list);

Commit log mentions " Make sure the duplicate data does not get added"
But this doesn't have any check in case multiple GPUs points to the same
egm_dev, right? Or the commit meant something else?

Thanks,
Shameer