Re: [PATCH RFC v2 00/15] Add virtualization support for EGM

From: Ankit Agrawal

Date: Thu Mar 12 2026 - 09:54:46 EST


>> > nvgrace-gpu is manipulating sysfs
>> > on devices owned by nvgrace-egm, we don't have mechanisms to manage the
>> > aux device relative to the state of the GPU, we're trying to add a
>> > driver that can bind to device created by an out-of-tree driver, and
>> > we're inventing new uAPIs on the chardev for things that already exist
>> > for vfio regions.
>>
>> Sorry for the confusion. The nvgrace-egm would not bind to the device
>> created by the out-of-tree driver. We would have a separate out-of-tree
>> equivalent of nvgrace-egm to bind to the device by the out-of-tree vfio
>> driver. Maybe we can consider exposing a register/unregister APIs from
>> nvgrace-egm where a module (in-tree nvgrace / out-of-tree) can register
>> a pdev and nvgrace-egm can use to fetch the region info.
>
> Ok, this wasn't clear to me, but does that also mean that if some GPUs
> are managed by nvgrace-gpu and others by out-of-tree drivers that the
> in-kernel and out-of-tree equivalent drivers are both installing
> chardevs as /dev/egmXX?  Playing in the same space is ugly, but what
> happens when the 2 GPUs per socket are split between drivers and they
> both try to added the same chardev?

But that would be an unsupported configuration. It is expected that all the
GPUs on the system and the EGM char devices to be attached to the same
VM for full functionality. So either all the devices (GPU and EGM chardev)
would be bound to nvgrace or to the out-of-tree module. Please refer sec 8.1
https://docs.nvidia.com/multi-node-nvlink-systems/partition-guide-v1-2.pdf
Perhaps I should add this information in the commit message.

> However, I'd then ask the question why we're associating EGM to the GPU
> PCI driver at all.  For instance, why should nvgrace-gpu spawn aux
> devices to feed into an nvgrace-egm driver, and duplicate that whole
> thing in an out-of-tree driver, when we could just have one in-kernel
> platform(?) driver walk ACPI, find these ranges, and expose them as
> chardev entirely independent of the PCI driver bound to the GPU?

So a new platform driver to walk through the ACPI and look for EGM properties
and create EGM char devs?

Maybe it is okay, but given that all the 4 EGM properties are under the GPU's
ACPI node and there being no independent ACPI _HID device identity, it sounds
a bit off to me. Do we have a precedent like that?

But as I mentioned above, the expectation is that the EGM devices and the GPU
devices to be assigned to the same VM. So would it not make sense that we
keep the association between the EGM devices and the GPU devices?