Re: [PATCH] vfio/pci: Support error recovery
From: Cao jin
Date: Tue Dec 06 2016 - 21:46:00 EST
On 12/06/2016 11:35 PM, Alex Williamson wrote:
> On Tue, 6 Dec 2016 18:46:04 +0800
> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
>
>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
>>> On Tue, 6 Dec 2016 05:55:28 +0200
>>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>>>
>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
>>>>> If you're going to take the lead for these AER patches, I would
>>>>> certainly suggest that understanding the reasoning behind the bus reset
>>>>> behavior is a central aspect to this series. This effort has dragged
>>>>> out for nearly two years and I apologize, but I don't really have a lot
>>>>> of patience for rehashing some of these issues if you're not going to
>>>>> read the previous discussions or consult with your colleagues to
>>>>> understand how we got to this point. If you want to challenge some of
>>>>> the design points, that's great, it could use some new eyes, but please
>>>>> understand how we got here first.
>>>>
>>>> Well I'm guessing Cao jin here isn't the only one not
>>>> willing to plough through all historical versions of the patchset
>>>> just to figure out the motivation for some code.
>>>>
>>>> Including a summary of a high level architecture couldn't hurt.
>>>>
>>>> Any chance of writing such? Alternatively, we can try to build it as
>>>> part of this thread. Shouldn't be hard as it seems somewhat
>>>> straight-forward on the surface:
>>>>
>>>> - detect link error on the host, don't reset link as we would normally do
>>>
>>> This is actually a new approach that I'm not sure I agree with. By
>>> skipping the host directed link reset, vfio is taking responsibility
>>> for doing this, but then we just assume the user will do it. I have
>>> issues with this.
>>>
>>> The previous approach was to use the error detected notifier to block
>>> access to the device, allowing the host to perform the link reset. A
>>> subsequent notification in the AER process released the user access
>>> which allowed the user AER process to proceed. This did result in both
>>> a host directed and a guest directed link reset, but other than
>>> coordinating the blocking of the user process during host reset, that
>>> hasn't been brought up as an issue previously.
>>>
>>
>> Tests on previous versions didn't bring up issues as I find, I think
>> that is because we didn't test it completely. As I know, before August
>> of this year, we didn't have cable connected to NIC, let alone
>> connecting NIC to gateway.
>
> Lack of testing has been a significant issue throughout the development
> of this series.
>
>> Even if I fixed the guest oops issue in igb driver that Alex found in
>> v9, v9 still cannot work in my test. And in my test, disable link
>> reset(in host) in aer core for vfio-pci is the most significant step to
>> get my test passed.
>
> But is it the correct step? I'm not convinced. Why did blocking guest
> access not work? How do you plan to manage vfio taking the
> responsibility to perform a bus reset when you don't know whether QEMU
> is the user of the device or whether the user supports AER recovery?
>
>>>> - report link error to guest
>>>> - detect link reset request from guest
>>>> - reset link on host
>>>>
>>>> Since link reset will reset all devices behind it, for this to work we
>>>> need same set of devices behind the link in host and guest. Enforcing
>>>> this would be nice to have.
>>>
>>> This is a pretty significant complication and I think it's a
>>> requirement. This is the heart of why we have an AER vfio-pci device
>>> option and why we require that QEMU should fail to initialize the
>>> device if AER is enabled in an incompatible configuration. If a user
>>> cannot be sure that AER is enabled on a device, it's pointless to
>>> bother implementing it, though honestly I question the value of it in
>>> the VM altogether given configuration requirements (ie. are users
>>> going to accept the reason that all the ports of a device need to be
>>> assigned to a single guest for guest-based AER recovery when they were
>>> previously able to assign each port to a separate VM?).
>>>
>>>> - as link now might end up in bad state, reset
>>>> it when device is unassigned
>>>
>>> This is also a new aspect for the approach here, previously we allowed
>>> the host directed link reset so we could assume the device was
>>> sufficiently recovered. In the proposal here, the AER core skips any
>>> devices bound to vfio-pci, but vfio can't guarantee that we can do a
>>> bus reset on them. PCI bus isolation is not accounted for in DMA
>>> isolation, which is the basis for IOMMU groups. A bus can host
>>> multiple IOMMU groups, each of which may have a different user. Only
>>> in a very specific configuration can vfio do a bus reset.
>>>
>>>> Any details I missed?
>>>
>>> AIUI, the critical feature is that the guest needs to be able to reset
>>> the device link, all the other design elements play out from the
>>> logical expansion of that feature. It means that a guest bus reset
>>> needs to translate to a host bus reset, which means that all of the
>>> affected host devices need to be accounted for and those that are
>>> assigned to the guest need to be affected in the guest in the same
>>> way. QEMU must enforce this configuration or else a user cannot know
>>> the result of a AER fault, ie. will it cause a VMSTOP condition or is
>>> the fault forwarded to the guest. The required configuration
>>> restrictions are quite involved, therefore we can't simply require this
>>> of all configurations, so a vfio-pci device option is added. The
>>> coordination of the host directed reset with the guest directed reset
>>> was only a complications discovered within the last few revisions of
>>> the series. As noted above, the previous solution to this was to
>>> attempt to block access to the device while the host reset proceeds.
>>>
>>> Clearly it's a little disconcerting if we throw all of that away and
>>> simply assume that an FLR is sufficient to reset the device when it
>>> seems like link issues might be a nontrivial source of AER faults. If
>>> FLR is sufficient, then why does the core AER handling code in the
>>> kernel not simply do this? Thanks,
>>>
>>
>> I agree with the points here. Now I understand why translate a guest
>> link reset to host link reset[*], and FLR shouldn't be equivalent to
>> link reset, but the PITY is, we can't trigger a real fatal error to see
>> if a FLR is sufficient to reset the device.
>
> In this case, it's not a matter of finding a scenario where it works.
> Using FLR to recover from a fatal error would need to be supported by
> verbiage in a specification. I'm not interested in cases where it
> happens to work on single device for a certain type of error. The link
> reset is the bare metal mechanism for recovering from a fatal error and
> it should be our mechanism as well unless there's spec wording to
> support another approach. Thanks,
>
Ok, I understand your points here, will drag that dropped patch back and
test.
--
Sincerely,
Cao jin