Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery capability to user space

From: James Morse
Date: Fri Dec 14 2018 - 08:56:03 EST

Hi Dongjiu Geng,

On 14/12/2018 10:15, Dongjiu Geng wrote:
> When user space do memory recovery, it will check whether KVM and
> guest support the error recovery, only when both of them support,
> user space will do the error recovery. This patch exports this
> capability of KVM to user space.

I can understand user-space only wanting to do the work if host and guest
support the feature. But 'error recovery' isn't a KVM feature, its a Linux
kernel feature.

KVM will send it's user-space a SIGBUS with MCEERR code whenever its trying to
map a page at stage2 that the kernel-mm code refuses this because its poisoned.
(e.g. check_user_page_hwpoison(), get_user_pages() returns -EHWPOISON)

This is exactly the same as happens to a normal user-space process.

I think you really want to know if the host kernel was built with
CONFIG_MEMORY_FAILURE. The not-at-all-portable way to tell this from user-space
is the presence of /proc/sys/vm/memory_failure_* files.
(It looks like the prctl():PR_MCE_KILL/PR_MCE_KILL_GET options silently update
an ignored policy if the kernel isn't built with CONFIG_MEMORY_FAILURE, so they
aren't helpful)

> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index cd209f7..241e2e2 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4895,3 +4895,12 @@ Architectures: x86
> This capability indicates that KVM supports paravirtualized Hyper-V IPI send
> hypercalls:
> HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
> +
> +
> +Architectures: arm, arm64
> +
> +This capability indicates that guest memory error can be detected by the KVM which
> +supports the error recovery.

KVM doesn't detect these errors.
The hardware detects them and notifies the OS via one of a number of mechanisms.
This gets plumbed into memory_failure(), which sets a flag that the mm code uses
to prevent the page being used again.

KVM is only involved when it tries to map a page at stage2 and the mm code
rejects it with -EHWPOISON. This is the same as the architectures
do_page_fault() checking for (fault & VM_FAULT_HWPOISON) out of
handle_mm_fault(). We don't have a KVM cap for this, nor do we need one.

> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b72a3dd..90d1d9a 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -82,6 +82,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> r = kvm_arm_support_pmu_v3();
> break;
> r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> break;

The CPU RAS Extensions are not at all relevant here. It is perfectly possible to
support memory-failure without them, AMD-Seattle and APM-X-Gene do this. These
systems would report not-supported here, but the kernel does support this stuff.
Just because the CPU supports this, doesn't mean the kernel was built with
CONFIG_MEMORY_FAILURE. The CPU reports may be ignored, or upgraded to SIGKILL.