Re: [PATCH v2 1/1] vfio/nvgpu: Add vfio pci variant module for grace hopper

From: Alex Williamson
Date: Thu May 25 2023 - 11:22:27 EST

On Tue, 16 May 2023 21:28:35 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, May 16, 2023 at 03:09:14PM -0600, Alex Williamson wrote:
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +config NVGPU_VFIO_PCI
> > > + tristate "VFIO support for the GPU in the NVIDIA Grace Hopper Superchip"
> > > + depends on ARM64 || (COMPILE_TEST && 64BIT)
> > > + select VFIO_PCI_CORE
> >
> > I think this should be a 'depends on' as well, that's what we have for
> > the other vfio-pci variant drivers.
> It should be removed completely, AFAICT:
> config VFIO_PCI
> tristate "Generic VFIO support for any PCI device"
> select VFIO_PCI_CORE
> Ensures it is turned on
> source "drivers/vfio/pci/mlx5/Kconfig"
> endif

The source command actually comes after the VFIO_PCI endif, the mlx5
Kconfig is sourced if PCI && MMU.

> Autoamtically injects a 'depends on VFIO_PCI' to all the enclosed
> kconfig statements (and puts them nicely in the menu)
> So we have everything needed already
> SELECT is the correct action since it doesn't have a config text.

In fact I think it's the current variant drivers that are incorrect to
make use of 'depends on', this makes those variant drivers implicitly
depend on VFIO_PCI, but it should instead be possible to build a kernel
that doesn't include vfio-pci but does include mlx5-vfio-pci, or other
vfio-pci variant drivers. Currently if I disable VFIO_PCI I no longer
have the option to select either the mlx5 or hisi_acc drivers, they
actually depend only on VFIO_PCI_CORE, but currently only VFIO_PCI can

I withdraw my objection to using select, the other variant drivers
should adopt select as well, imo.

> > Is our test for vm_end < vm_start in vfio-pci-core just paranoia? I
> > don't see an equivalent here.
> Yes, mm core will not invoke the op with something incorrect.
> > Can we also get a comment in the code outlining the various reasons
> > that this "BAR" doesn't need the disabled access protections that
> > vfio-pci-core implements? For example outlining the behavior relative
> > to BAR access while the memory enable bit is disabled, the bus being in
> > reset, or the device being in a low-power state.
> The HW has some "isolation" feature that kicks in and safely
> disconnects the GPU from the CPU.
> A lot of work has been done to make things like VFIO and KVM safe
> against machine checks/etc under basically all circumstances.

So a comment in the code to reflect that the hardware takes this into
account such that we don't need to worry about mmap access during bus
reset or otherwise disabled MMIO access of the PCI device would not be
unreasonable. Thanks,