Re: [PATCH] vfio/pci: Support error recovery
From: Cao jin
Date: Thu Dec 08 2016 - 09:43:11 EST
On 12/06/2016 11:35 PM, Alex Williamson wrote:
> On Tue, 6 Dec 2016 18:46:04 +0800
> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote:
>
>> On 12/06/2016 12:59 PM, Alex Williamson wrote:
>>> On Tue, 6 Dec 2016 05:55:28 +0200
>>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>>>
>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
>>>>> If you're going to take the lead for these AER patches, I would
>>>>> certainly suggest that understanding the reasoning behind the bus reset
>>>>> behavior is a central aspect to this series. This effort has dragged
>>>>> out for nearly two years and I apologize, but I don't really have a lot
>>>>> of patience for rehashing some of these issues if you're not going to
>>>>> read the previous discussions or consult with your colleagues to
>>>>> understand how we got to this point. If you want to challenge some of
>>>>> the design points, that's great, it could use some new eyes, but please
>>>>> understand how we got here first.
>>>>
>>>> Well I'm guessing Cao jin here isn't the only one not
>>>> willing to plough through all historical versions of the patchset
>>>> just to figure out the motivation for some code.
>>>>
>>>> Including a summary of a high level architecture couldn't hurt.
>>>>
>>>> Any chance of writing such? Alternatively, we can try to build it as
>>>> part of this thread. Shouldn't be hard as it seems somewhat
>>>> straight-forward on the surface:
>>>>
>>>> - detect link error on the host, don't reset link as we would normally do
>>>
>>> This is actually a new approach that I'm not sure I agree with. By
>>> skipping the host directed link reset, vfio is taking responsibility
>>> for doing this, but then we just assume the user will do it. I have
>>> issues with this.
>>>
>>> The previous approach was to use the error detected notifier to block
>>> access to the device, allowing the host to perform the link reset. A
>>> subsequent notification in the AER process released the user access
>>> which allowed the user AER process to proceed. This did result in both
>>> a host directed and a guest directed link reset, but other than
>>> coordinating the blocking of the user process during host reset, that
>>> hasn't been brought up as an issue previously.
>>>
>>
>> Tests on previous versions didn't bring up issues as I find, I think
>> that is because we didn't test it completely. As I know, before August
>> of this year, we didn't have cable connected to NIC, let alone
>> connecting NIC to gateway.
>
> Lack of testing has been a significant issue throughout the development
> of this series.
>
>> Even if I fixed the guest oops issue in igb driver that Alex found in
>> v9, v9 still cannot work in my test. And in my test, disable link
>> reset(in host) in aer core for vfio-pci is the most significant step to
>> get my test passed.
>
> But is it the correct step? I'm not convinced. Why did blocking guest
> access not work? How do you plan to manage vfio taking the
> responsibility to perform a bus reset when you don't know whether QEMU
> is the user of the device or whether the user supports AER recovery?
>
Maybe currently we don't have enough proof to prove the correctness, but
I think I did find some facts to prove that link reset in host is a big
trouble, and can answer part of questions above.
1st, some thoughts:
In pci-error-recovery.txt and do_recovery() of kernel tree, we can see,
a recovery consists of several steps(callbacks), link reset is one of
them, and except link reset, the others are seems kind of device
specific. In our case, both host & guest will do recovery, I think the
host recovery actually is some kind of fake recovery, see vfio-pci
driver's error_detected & resume callback, they don't do anything
special, mainly signal error to user, but the link reset in host "fake
reset" does some serious work, in other words, I think host does the
recovery incompletely, so I was thinking, why not just drop incompletely
host recovery(drop link reset) for vfio-pci, and let the guest take care
of the whole serious recovery. This is part of the reason of why my
version looks like this. But yes, I admit the issue Alex mentioned,
vfio can't guarantee that user will do a bus reset, this is an issue I
will keep looking for a solution.
2nd, some facts and analyzation from test:
In host, the relationship between time and behviour in each component
roughly looks as following:
+ HW + host kernel + qemu + guest kernel +
| |(error recovery)| | |
| | | | |
| | vfio-pci's | | |
| | error_detected | | |
| | + | | |
| | | | | |
| | | error notify | | |
| | | via eventfd | | |
| | +---------------> +----------+ | |
| | | +vfio_err_ | | |
| | | |notifier_ | | |
| +---- +<---+link reset | |handler | | |
| | HW | | | | | | |
| | | | | | | |
| |r | | vfio-pci's | |pass aer | | |
| |e.. | | resume | |to guest | | |
| |s. | | (blocking end) | | | | |
| |e | | | | *2* | | |
| |t | | | +----+-----+ | |
| |i | | | | | |
| |n | | | +--------> +----------+ |
| |g | | | | | guest | |
| | | | | | | recovery | |
| | | | | | | process | |
| | | | | | |(include | |
| | | | | | |register | |
| | *1* | | | | |access) | |
| | | | | | | | |
| | | | | | | *3* | |
| | | | +----------+ |
Time | | | | |
| | | | |
| | | | |
| | | | |
v | | | |
Now let me try to answer: Why did blocking guest access not work?
Some important factor:
1. host recovery doesn't do anything special except error notifying, so
it may be executed very fast.
2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing
it need many ms, pretty long?
some facts found in v9(block config write, not read, during host
recovery) test:
1. reading uncor error register in vfio_err_notifier_handler sometimes
returns correct value, sometimes return invalid value(All F's)
So, I am thinking, if host blocking on host end early, and *2*, *3* is
parallel with *1*, the way used in v9 to blocking guest access, may not
work.
--
Sincerely,
Cao jin