Re: [PATCH 0/2] Improve vfio-pci primary GPU assignment behavior

From: Javier Martinez Canillas
Date: Tue Jun 07 2022 - 14:18:32 EST


Hello Alex,

On 6/6/22 19:53, Alex Williamson wrote:
> Users attempting to enable vfio PCI device assignment with a GPU will
> often block the default PCI driver from the device to avoid conflicts
> with the device initialization or release path. This means that
> vfio-pci is sometimes the first PCI driver to bind to the device. In
> the case of assigning the primary graphics device, low-level console
> drivers may still generate resource conflicts. Users often employ
> kernel command line arguments to disable conflicting drivers or
> perform unbinding in userspace to avoid this, but the actual solution
> is often distribution/kernel config specific based on the included
> drivers.
>
> We can instead allow vfio-pci to copy the behavior of
> drm_aperture_remove_conflicting_pci_framebuffers() in order to remove
> these low-level drivers with conflicting resources. vfio-pci is not
> however a DRM driver, nor does vfio-pci depend on DRM config options,
> thus we split out and export the necessary DRM apterture support and
> mirror the framebuffer and VGA support.
>
> I'd be happy to pull this series in through the vfio branch if
> approved by the DRM maintainers. Thanks,
>

I understand your issue but I really don't think that using this helper
is the correct thing to do. We already have some races with the current
aperture infrastructure As an example you can look at [0].

The agreement on the mentioned thread is that we want to unify the fbdev
and DRM drivers apertures into a single list, and ideally moving all to
the Linux device model to handle the removal of conflicting devices.

That's why I don't feel that leaking the DRM aperture helper to another
is desirable since it would make even harder to cleanup this later.

But also, this issue isn't something that only affects graphic devices,
right? AFAIU from [1] and [2], the same issue happens if a PCI device
has to be bound to vfio-pci but already was bound to a host driver.

The fact that DRM happens to have some infrastructure to remove devices
that conflict with an aperture is just a coincidence. Since this is used
to remove devices bound to drivers that make use of the firmware-provided
system framebuffer.

The series [0] mentioned above, adds a sysfb_disable() that disables the
Generic System Framebuffer logic that is what registers the framebuffer
devices that are bound to these generic video drivers. On disable, the
devices registered by sysfb are also unregistered.

Would be enough for your use case to use that helper function if it lands
or do you really need to look at the apertures? That is, do you want to
remove the {vesa,efi,simple}fb and simpledrm drivers or is there a need
to also remove real fbdev and DRM drivers?

[0]: https://lore.kernel.org/lkml/YnvrxICnisXU6I1y@xxxxxxxxxxxx/T/
[1]: https://www.ibm.com/docs/en/linux-on-systems?topic=through-pci
[2]: https://www.kernel.org/doc/Documentation/vfio.txt

--
Best regards,

Javier Martinez Canillas
Linux Engineering
Red Hat