Re: [PATCH v6] vfio error recovery: kernel support

From: Cao jin
Date: Thu Apr 06 2017 - 05:34:20 EST




On 04/06/2017 05:56 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 05, 2017 at 04:54:33PM +0800, Cao jin wrote:
>> Apparently, I don't have experience to induce non-fatal error, device
>> error is more of a chance related with the environment(temperature,
>> humidity, etc) as I understand.
>
> I'm not sure how to interpret this statement. I think what Alex is
> saying is simply that patches should include some justification. They
> make changes but what are they improving?
> For example:
>
> I tested device ABC in conditions DEF. Without a patch VM
> stops. With the patches applied VM recovers and proceeds to
> use the device normally.
>
> is one reasonable justification imho.
>

Got it. But unfortunately, until now, I haven't seen a VM stop caused by
a real device non-fatal error during device assignment(Only saw real
fatal errors after start VM).
On one side, AER error could occur theoretically; on the other side,
seldom people have seen a VM stop caused by AER. Now I am asked that do
I have a real evidence or scenario to prove that this patchset is really
useful? I don't, and we all know it is hard to trigger a real hardware
error, so, seems I am pushed into the corner. I guess these questions
also apply for AER driver's author, if the scenario is easy to
reproduce, there is no need to write aer_inject to fake errors.

--
Sincerely,
Cao jin