Re: BUG REPORT: vfio_pci driver

From: Alex Williamson
Date: Fri Aug 13 2021 - 18:09:14 EST


On Fri, 13 Aug 2021 11:34:51 -0700
Russ Weight <russell.h.weight@xxxxxxxxx> wrote:

> Bug Description:
>
> A bug in the vfio_pci driver was reported in junction with work on FPGA

This looks like the documented behavior of an IRQ index reporting the
VFIO_IRQ_INFO_NORESIZE flag. We can certainly work towards trying to
remove the flag from this index, but it seems the userspace driver is
currently ignoring the flag and expecting exactly the behavior the flag
indicates is not available. Thanks,

Alex

> cards. We were able to reproduce and root-cause the bug using system-tap.
> The original bug description is below. An understanding of the referenced
> dfl and opae tools is not required - it is the sequence of IOCTL calls and
> IRQ vectors that matters:
>
> > I’m trying to get an example AFU working that uses 2 IRQs, active at the same
> > time. I’m hitting what looks to be a dfl_pci driver bug.
> >
> > The code tries to allocate two IRQ vectors: 0 and 1. I see opaevfio.c doing the
> > right thing, picking the MSIX index. Allocating either IRQ 0 or IRQ 1 works fine
> > and I confirm that the VFIO_DEVICE_SET_IRQS looks reasonable, choosing MSIX and
> > either start of 0 or 1 and count 1.
> >
> > Note that opaevfio.c always passes count 1, so it will make separate calls for
> > each IRQ vector.
> >
> > When I try to allocate both, I see the following:
> >
> > * If the VFIO_DEVICE_SET_IRQS ioctl is called first with start 0 and then
> > start 1 (always count 1), the start 1 (second) ioctl trap returns EINVAL.
> > * If I set up the vectors in decreasing order, so start 1 followed by start 0,
> > the program works!
> > * I ruled out OPAE SDK user space problems by setting up my program to
> > allocate in increasing order, which would normally fail. I changed only the
> > ioctl call in user space opaevfio.c, inverting bit 0 of start so that the
> > driver is called in decreasing index order. Of course this binds the wrong
> > vectors to the fds, but I don’t care about that for now. This works! From
> > this, I conclude that it can’t be a user space problem since the difference
> > between working and failing is solely the order in which IRQ vectors are
> > bound in ioctl calls.
>
> The EINVAL is coming from vfio_msi_set_block() here:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L373
>
> vfio_msi_set_block() is being called from vfio_pci_set_msi_trigger() here on
> the second IRQ request:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L530
>
> We believe the bug is in vfio_pci_set_msi_trigger(), in the 2nd parameter to the call
> to vfio_msi_enable() here:
> https://github.com/torvalds/linux/blob/master/drivers/vfio/pci/vfio_pci_intrs.c#L533
>
> In both the passing and failing cases, the first IRQ request results in a call
> to vfio_msi_enable() at line 533 and the second IRQ request results in the
> call to vfio_msi_set_block() at line 530. It is during the first IRQ request
> that vfio_msi_enable() sets vdev->num_ctx based on the 2nd parameter (nvec).
> vdev->num_ctx is part of the conditional that results in the EINVAL for the
> failing case.
>
> In the passing case, vdev->num_ctx is 2. In the failing case, it is 1.
>
> I am attaching two text files containing trace information from systemtap: one for
> the failing case and one for the passing case. They contain a lot more information
> than is needed, but if you search for vfio_pci_set_msi_trigger and vfio_msi_set_block,
> you will see values for some of the call parameters.
>
> - Russ
>