RE: [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before handing out to VM
From: Shameer Kolothum Thodi
Date: Thu Feb 26 2026 - 13:22:56 EST
> -----Original Message-----
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
> Sent: 23 February 2026 15:55
> To: Ankit Agrawal <ankita@xxxxxxxxxx>; Vikram Sethi <vsethi@xxxxxxxxxx>;
> Jason Gunthorpe <jgg@xxxxxxxxxx>; Matt Ochs <mochs@xxxxxxxxxx>;
> jgg@xxxxxxxx; Shameer Kolothum Thodi <skolothumtho@xxxxxxxxxx>;
> alex@xxxxxxxxxxx
> Cc: Neo Jia <cjia@xxxxxxxxxx>; Zhi Wang <zhiw@xxxxxxxxxx>; Krishnakant
> Jaju <kjaju@xxxxxxxxxx>; Yishai Hadas <yishaih@xxxxxxxxxx>;
> kevin.tian@xxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: [PATCH RFC v2 10/15] vfio/nvgrace-egm: Clear Memory before
> handing out to VM
>
> From: Ankit Agrawal <ankita@xxxxxxxxxx>
>
> The EGM region is invisible to the host Linux kernel and it does not
> manage the region. The EGM module manages the EGM memory and thus is
> responsible to clear out the region before handing out to the VM.
>
> Clear EGM region on EGM chardev open. To avoid CPU lockup logs,
> zap the region in 1G chunks.
>
> Suggested-by: Vikram Sethi <vsethi@xxxxxxxxxx>
> Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
> ---
> drivers/vfio/pci/nvgrace-gpu/egm.c | 43
> ++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/drivers/vfio/pci/nvgrace-gpu/egm.c b/drivers/vfio/pci/nvgrace-
> gpu/egm.c
> index 5786ebe374a5..de7771a4145d 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/egm.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/egm.c
> @@ -15,6 +15,7 @@ static DEFINE_XARRAY(egm_chardevs);
> struct chardev {
> struct device device;
> struct cdev cdev;
> + atomic_t open_count;
> };
>
> static struct nvgrace_egm_dev *
> @@ -30,6 +31,42 @@ static int nvgrace_egm_open(struct inode *inode,
> struct file *file)
> {
> struct chardev *egm_chardev =
> container_of(inode->i_cdev, struct chardev, cdev);
> + struct nvgrace_egm_dev *egm_dev =
> + egm_chardev_to_nvgrace_egm_dev(egm_chardev);
> + void *memaddr;
> +
> + if (atomic_cmpxchg(&egm_chardev->open_count, 0, 1) != 0)
> + return -EBUSY;
> +
> + /*
> + * nvgrace-egm module is responsible to manage the EGM memory as
> + * the host kernel has no knowledge of it. Clear the region before
> + * handing over to userspace.
> + */
> + memaddr = memremap(egm_dev->egmphys, egm_dev->egmlength,
> MEMREMAP_WB);
> + if (!memaddr) {
> + atomic_dec(&egm_chardev->open_count);
> + return -ENOMEM;
> + }
> +
> + /*
> + * Clear in chunks of 1G to avoid CPU lockup logs.
> + */
> + {
> + size_t remaining = egm_dev->egmlength;
> + u8 *chunk_addr = (u8 *)memaddr;
> + size_t chunk_size;
> +
> + while (remaining > 0) {
> + chunk_size = min(remaining, SZ_1G);
> + memset(chunk_addr, 0, chunk_size);
> + cond_resched();
> + chunk_addr += chunk_size;
> + remaining -= chunk_size;
> + }
> + }
> +
> + memunmap(memaddr);
I am not sure this is safe. If userspace does:
open(fd)
mmap()
close(fd)
The mmap mapping stays alive and accessible in userspace even after
the close(). Since the release function decrements open_count on close(),
a second process could then call open() and wipe the mapping while it's
still live.
I may be wrong, but please double check the mapping lifecycle here.
Thanks,
Shameer