Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

From: James Morse
Date: Thu Jul 06 2017 - 12:40:33 EST

Hi gengdongjiu,

On 05/07/17 09:14, gengdongjiu wrote:
> On 2017/7/4 18:14, James Morse wrote:
>> Can you give us a specific example of an error you are trying to handle?

> For example:
> guest OS user space accesses device type memory, but happen SError. because the
> SError is asynchronous faults, it does not take immediately. when guest OS call "SVC" to enter guest os
> kernel space, the ESB instruction(Error Synchronization Barrier) will defter this SError. so the SError happen immediately.

Ah, this isn't necessarily a 'RAS notification' SError/SEI, it may be a
'vanilla', v8.0 SError.

You've given a guest access to a physical device (how?), the guest has done
something, which caused the device to respond with SError.

Do you have a specific use-case for this? What is the ESR? What kinds of CPER
records does firmware generate? (if any)

We have to be careful here as devices can still generate asynchronous-interrupts
using SError, these aren't contained by ESB barriers. For these we should fall
back to KVM's v8.0 SError behaviour. KVM can tell them apart as the APEI code
doesn't claim the SError as an SEI notification, and with the RAS extensions the
ESR has the 'IDS' bit set.

>> How would a non-KVM user space process handle the error?

> it is indeed, non-KVM user space can not get the notification from hypervisor or host kernel. thanks for the pointing out
> do you mean still Signal SIGBUS from memory_failure?

No, I was assuming this was a RAS notification SEI, (because your patch 1/3 of
touched the RAS cpu-features) being given to user space to handle.

Instead, can I ask how the host should handle this SError if it had accessed the
device itself?

I agree device pass-through is going to be a special case for KVM, but before
the host can deliver a device RAS error into the guest that was using the
device, it needs to fully understand what the error means:

The error may mean that the careful configuration that makes device-passthrough
safe no longer works, letting the guest continue to access the device may let it
damage another guest or the hyper visor.

We may need a way for the host RAS code to identify the driver responsible, to
handle the device error, or delegate it if that's appropriate.


>> So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
>> via a KVM-specific API.
>> Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
>> implementation details to user space. What if I discover the same error via a
>> Polling GHES, or one of the IRQ flavours?

> James, you mainly concern the way that "tell Qemu/kvmtool via a KVM-specific API", right?
> so how about still delivered SIGBUS same as the SEA(Synchronous External Abort)?

> by the way, what is your meaning of below words?
> >"What if I discover the same error via a Polling GHES, or one of the IRQ flavours?"

This was my mistaken assumption that you were passing an APEI RAS SEI
notification to user space via a KVM specific API. This wouldn't work for
applications not using KVM, or notifications not using SEI.

Here I was asking what happens if the notification used NOTIFY_POLL or
NOTIFY_IRQ (instead of NOTIFY_SEI) in the GHES, but this isn't relevant as it
doesn't look like this is a APEI RAS notification.


>> If there is another type of CPER record where we should notify userspace, please
>> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
>> drivers/firmware/efi/cper.c. These should consider all user-space applications,
>> not just users of KVM, and not just on arm64.
> here I have a question, in the "drivers/acpi/apei/ghes.c" code, it only handle the memory section of CPER.

Yes, we are certainly missing processing for the other record types.

> if the section type of CPER is processor, it will not notify user-space. so only let userspace handle the memory section is reasonable?

I think the only errors that user-space can know more than the kernel are memory
errors. These are the only RAS errors we should expect user space to handle. All
the others fall into either 'corrected by the kernel' or 'fatal for userspace -

>> For memory errors we already have BUS_MCEERR_AR - action-required, and
>> BUS_MCEERR_AO - action-optional.
>> For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux has
>> to classify the error and handle it as far as possible. In most cases the error
>> is either handled (no notification required), or fatal. Memory errors are the
>> only example I've found so far where an application can do additional work to
>> handle the error.

> James, only memory errors needs application to do additional work. UEFI spec mentioned that?

No, its my observation based on the record types. Memory is the only thing an
application can change. Everything else belongs to the kernel.
For a corrupt page of anonymous memory, there is nothing the kernel can do, but
report the data lost. The application (e.g. web browser) may know what the
corrupt data was, and if/how it can retrieve it again. This isn't true of
Processor/Cache/TLB/PCIe errors, which cover the other CPER records.