Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

From: James Morse
Date: Tue Mar 28 2017 - 09:29:25 EST


Hi Peter,

On 28/03/17 12:33, Peter Maydell wrote:
> On 28 March 2017 at 12:23, Christoffer Dall <cdall@xxxxxxxxxx> wrote:
>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>> On the host, part of UEFI is involved to generate the CPER records.
>>> In a guest?, I don't know.
>>> Qemu could generate the records, or drive some other component to do it.
>>
>> I think I am beginning to understand this a bit. Since the guet UEFI
>> instance is specifically built for the machine it runs on, QEMU's virt
>> machine in this case, they could simply agree (by some contract) to
>> place the records at some specific location in memory, and if the guest
>> kernel asks its guest UEFI for that location, things should just work by
>> having logic in QEMU to process error reports and populate guest memory.
>
> Is "write direct to guest memory" the best ABI here or would
> it be preferable to use the fw_cfg interface for the guest UEFI
> to retrieve the data items on demand?

As far as I understand the interaction between Qemu and UEFI isn't defined. The
eventual aim is to emulate ACPI's firmware first error handling for a guest.
This way the RAS behaviour for a host and the guest is the same.

The ABI is the guest OS gets a 'notification' (there is a list in acpi: 18.3.2.9
Hardware Error Notification), and then finds a pointer to the CPER records
(defined in the UEFI Spec Appendix N) at the address advertised by one of the
Generic Hardware Error Source (GHES) entries of the ACPI Hardware Error Source
Table (HEST).

How Qemu and UEFI conspire to make this happen is up for discussion.
My suggestion would be to try and closely mirror whatever happens on a physical
system so that UEFI only needs to test one set of code.


> Is there a pre-existing "this is how it works on x86" implementation?

I found [0], which on page 16 talks about Qemu injecting a Pseudo 'Software
Recoverable Action Required', which I assume is a flavour of NMI.

The ACPI/CPER stuff is arch agnostic and given Qemu is only ever likely to have
to handle memory errors it should be possible to use the same code for both x86
and arm64.


Thanks,

James

[0] https://events.linuxfoundation.org/images/stories/slides/lfcs2013_tanino.pdf