Re: [PATCH] vfio/pci: Support error recovery

From: Michael S. Tsirkin
Date: Mon Dec 12 2016 - 17:29:48 EST


On Mon, Dec 12, 2016 at 12:12:16PM -0700, Alex Williamson wrote:
> On Mon, 12 Dec 2016 21:49:01 +0800
> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
>
> > Hi,
> > I have 2 solutions(high level design) came to me, please see if they are
> > acceptable, or which one is acceptable. Also have some questions.
> >
> > 1. block guest access during host recovery
> >
> > add new field error_recovering in struct vfio_pci_device to
> > indicate host recovery status. aer driver in host will still do
> > reset link
> >
> > - set error_recovering in vfio-pci driver's error_detected, used to
> > block all kinds of user access(config space, mmio)
> > - in order to solve concurrent issue of device resetting & user
> > access, check device state[*] in vfio-pci driver's resume, see if
> > device reset is done, if it is, then clear"error_recovering", or
> > else new a timer, check device state periodically until device
> > reset is done. (what if device reset don't end for a long time?)
> > - In qemu, translate guest link reset to host link reset.
> > A question here: we already have link reset in host, is a second
> > link reset necessary? why?
> >
> > [*] how to check device state: reading certain config space
> > register, check return value is valid or not(All F's)
>
> Isn't this exactly the path we were on previously? There might be an
> optimization that we could skip back-to-back resets, but how can you
> necessarily infer that the resets are for the same thing? If the user
> accesses the device between resets, can you still guarantee the guest
> directed reset is unnecessary? If time passes between resets, do you
> know they're for the same event? How much time can pass between the
> host and guest reset to know they're for the same event? In the
> process of error handling, which is more important, speed or
> correctness?
>
> > 2. skip link reset in aer driver of host kernel, for vfio-pci.
> > Let user decide how to do serious recovery
> >
> > add new field "user_driver" in struct pci_dev, used to skip link
> > reset for vfio-pci; add new field "link_reset" in struct
> > vfio_pci_device to indicate link has been reset or not during
> > recovery
> >
> > - set user_driver in vfio_pci_probe(), to skip link reset for
> > vfio-pci in host.
> > - (use a flag)block user access(config, mmio) during host recovery
> > (not sure if this step is necessary)
> > - In qemu, translate guest link reset to host link reset.
> > - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET
> > is executed
> > - In vfio-pci driver's resume, new a timer, check "link_reset" field
> > periodically, if it is set in reasonable time, then clear it and
> > delete timer, or else, vfio-pci driver will does the link reset!
>
> What happens in the case of a multifunction device where each function
> is part of a separate IOMMU group and one function is hot-removed from
> the user?

So just don't do it then. Topology must match between host and guest,
except maybe for the case of devices with host driver (e.g. PF)
which we might be able to synchronize against.

> We can't do a link reset on that function since the other
> function is still in use. We have no choice but release a device in an
> unknown state back to the host. As previously discussed, we don't
> expect that any sort of function-level FLR will necessarily reset the
> device to the same state. I also don't really like vfio-pci taking
> over error handling capabilities from the PCI-core. That's redundant
> code and extra maintenance overhead.
>
> > A quick question:
> > I don't know how devices is divided into iommu groups, is it possible
> > for functions in a multi-function device to be split into different groups?
>
> Yes, if a multifunction device supports ACS or if we have quirks to
> expose that the functions do not perform internal peer-to-peer, then
> they may be in separate IOMMU groups, depending on the rest of the PCI
> topology. See:
>
> http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html
>
> Thanks,
> Alex