Re: [PATCH] vfio/pci: Support error recovery

From: Alex Williamson
Date: Mon Dec 12 2016 - 22:40:00 EST


On Tue, 13 Dec 2016 05:15:13 +0200
"Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:

> On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote:
> > > So just don't do it then. Topology must match between host and guest,
> > > except maybe for the case of devices with host driver (e.g. PF)
> > > which we might be able to synchronize against.
> >
> > We're talking about host kernel level handling here. The host kernel
> > cannot defer the link reset to the user under the assumption that the
> > user is handling the devices in a very specific way. The moment we do
> > that, we've lost.
>
> The way is same as baremetal though, so why not?

How do we know this? What if the user is dpdk? The kernel is
responsible for maintaining the integrity of the system and devices,
not the user.

> And if user doesn't do what's expected, we can
> do the full link reset on close.

That's exactly my point, if we're talking about multiple devices,
there's no guarantee that the close() for each is simultaneous. If one
function is released before the other we cannot do a bus reset. If
that device is then opened by another user before its sibling is
released, then we once again cannot perform a link reset. I don't
think it would be reasonable to mark the released device quarantined
until the sibling is released, that would be a terrible user experience.