Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

From: Peter Maydell
Date: Wed Dec 19 2018 - 14:03:14 EST


On Mon, 17 Dec 2018 at 15:56, James Morse <james.morse@xxxxxxx> wrote:
> I think the root issue here is the name of the cpufeature 'RAS Extensions', this
> doesn't mean RAS is new, or even requires these features. It's just standardised
> records, classification and a barrier.
> Not only is it possible to build a platform that supports RAS without this
> extensions: there are at least three platforms out there that do!
>
>
> On 15/12/2018 00:12, gengdongjiu wrote:
> >> On Fri, 14 Dec 2018 at 13:56, James Morse <james.morse@xxxxxxx> wrote:
> >>> On 14/12/2018 10:15, Dongjiu Geng wrote:
> >>>> When user space do memory recovery, it will check whether KVM and
> >>>> guest support the error recovery, only when both of them support,
> >>>> user space will do the error recovery. This patch exports this
> >>>> capability of KVM to user space.
> >>>
> >>> I can understand user-space only wanting to do the work if host and
> >>> guest support the feature. But 'error recovery' isn't a KVM feature,
> >>> its a Linux kernel feature.
>
> [...]
>
> > Thanks Peter's explanation. Frankly speaking, I agree Peter's suggestion.
> >
> > To James, I explain more to you, as peter said QEMU needs to check whether
> > the guest CPU is a type which can handle the error though guest ACPI table.
>
> I don't think this really matters. Its only the NMIlike notifications that the
> guest doesn't have to register or poll. The ones we support today extend the
> architectures existing behaviour: you would have taken an external-abort on a
> real system, whether you know about the additional metadata doesn't matter to Qemu.

Consider the case where we booted the guest using a DTB and no ACPI
table at all -- we certainly can't just call QEMU code that tries to
add entries to a nonexistent table. My main point is that there
needs to be logic in Dongjiu's QEMU patches that checks more than
just "does this KVM feature exist". I'm not sufficiently familiar
with all this RAS stuff to be certain what those checks should
be and what the right choices are; I just know we need to check
*something*...

> > Let us see the X86's QEMU logic:
> > 1. Before the vCPU created, it will set a default env->mcg_cap value with
>
> > MCE_CAP_DEF flag, MCG_SER_P means it expected the guest CPU model supports
> > RAS error recovery.[1] 2. when the vCPU initialize, it will check whether host
> > kernel support this feature[2]. Only when host kernel and default env->mcg_cap
> > value all expected this feature, then it will setup vCPU support RAS error
> > recovery[3].
>
> This looks like KVM exposing a CPU capability to Qemu, which then configures the
> behaviour KVM gives to the guest. This doesn't tell you anything about what the
> guest supports.

It tells you what the *guest CPU* supports, which for x86 is a combination
of (a) what did the user/machine model ask for and (b) what can KVM
actually implement. I don't much care whether the guest OS supports
anything or not, that's its business... but it does seem odd to me
that the equivalent Arm code is not similarly saying "what were we
asked for, and what can we do?".

I think one question here which it would be good to answer is:
if we are modelling a guest and we haven't specifically provided
it an ACPI table to tell it about memory errors, what do we do
when we get a sigbus from the host? We have basically two choices:
(1) send the guest an SError (aka asynchronous external abort)
anyway (with no further info about what the memory error is)
(2) just stop QEMU (as we would for a memory error in QEMU's
own memory)

thanks
-- PMM