[PATCH 00/12] x86/mce, KVM: X86: KVM memory poison and MCE injector support

From: isaku . yamahata
Date: Tue Oct 10 2023 - 04:35:41 EST


From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>

Background
==========
The TDX is a VM-based confidential computing technology. It encrypts guest
memory to protect it from software (host OS, VMM, firmware, etc. ...) outside of
the TCB. Software in the host can write to the protected guest memory to corrupt,
and the protected guest can consume corrupted memory. TDX uses machine checks
to notify the TDX vcpu consumption of corrupted memory.

For VM-based confidential computing (AMD SEV-SNP and Intel TDX), the KVM guest
memfd effort [1] is ongoing. It allows guests to use file descriptors as the
protected guest memory without user-space virtual address mapping. KVM handles
machine checks for guest vcpu specially. It sets up the guest so that vcpu
exits from running guests on machine check, checks the exit reason, and manually
raises the machine check by calling do_machine_check().

Although Linux supports hwpoison and MCE injection framework, there are gaps in
testing hwpoison or MCE for TDX KVM [2] with KVM guest memfd [1]. a) hwpoison
framework (debugfs /sys/kernel/debug/hwpoison/{corrupt-pfn, unpison-pfn}) uses
physical address. MADV_{HWPISON, UNPOISON} uses the virtual address of the user
process. However, KVM guest memfd requires file descriptor and offset. b) The
x86 MCE injection framework, /dev/mce-log (legacy deprecated device driver
interface) or debug fs /sys/kernel/debug/mce-inject/addr, also uses a physical
address. c) The x86 MCE injection framework injects machine checks in the
context of the injector process. KVM wants to inject machine check on behalf of
running vcpu.


Proposed solution
=================
This patch series fills those gaps and to test KVM with injecting machine
checks. The proposed solution is

a) Introduce new flags FADVISE_{HWPOISON, UNPOISON} to hwpoison memory to
posix_fadvise():
Possible options are a1) add new flags for posix_fadvise() because hwpoison with
file descriptor and offset is a generic operation, not specific to KVM. (This
patch series) a2) Add KVM guest memfd specific ioctl to inject hwpoison/trigger
MCE. We can use same value to MADV_{HWPOISON, UNPOISON}. a3) Add KVM-specific
debugfs entry for guest memfd. Say,
/sys/kernel/debug/kvm/<pid>-<vm-fd>/guest-memfd<fd>/hwpoison/{corrupt-offset,
unoison-offset}.

- fadvise(FADVISE_{HWPOISON, UNPOISON}): This patch series.
Generic interface. Not specific to KVM.
- KVM ioctl
KVM specific. The KVM debugfs is better.
- KVM debugfs
Debugfs is natural fit because this feature is for debug/test.
Specific to KVM guest_memfd.

b) Enhancement to x86 MCE injector:
Add debug fs entries to x86 MCE injector under /sys/kernel/debug/mce-inject/ to
allow necessary parameters. mcgstatus and notrigger. mcgstatus is for LMCE_S.
notrigger is to suppress triggering machine check handler so that KVM can
trigger it.

c) Add a debugfs entry for KVM vcpu to trigger MCE injection:
Because setting parameters for MCE is not specific to KVM, reuse the existing
debugfs mce-inject. The debugfs entry is only to cause KVM to trigger
MCE. An alternative is to add an interface to set parameters without b)
similar to /dev/mce-log.

- KVM debugfs: this patch series.
Debugfs is natural fit because this feature is for debug/test.
- New KVM ioctl
KVM debugfs seems better.
- Enhance /dev/mce-log
Because this is legacy, debugfs mce injector is better.
- Enhance /sys/kernel/debug/mce-inject/
Because the feature is KVM specific, adding the feature to KVM
interface is better than to the x86 MCE injector.

[1] https://lore.kernel.org/all/20230914015531.1419405-1-seanjc@xxxxxxxxxx/
KVM guest_memfd() and per-page attributes
https://lore.kernel.org/all/20230921203331.3746712-1-seanjc@xxxxxxxxxx/
[PATCH 00/13] KVM: guest_memfd fixes
[2] https://lore.kernel.org/all/cover.1690322424.git.isaku.yamahata@xxxxxxxxx/
v15 KVM TDX basic feature support

Isaku Yamahata (12):
x86/mce: Fix hw MCE injection feature detection
X86/mce/inject: Add mcgstatus for mce-inject debugfs
x86/mce/inject: Add notrigger entry to suppress MCE injection
x86/mce: Move and export inject_mce() from inject.c to core.c
mm/fadvise: Add flags to inject hwpoison for posix_fadvise()
mm/fadvise: Add FADV_MCE_INJECT flag for posix_fadvise()
x86/mce/inject: Wire up the x86 MCE injector to FADV_MCE_INJECT
x86/mce: Define a notifier chain for mce injector
KVM: X86: Add debugfs to inject machine check on VM exit
KVM: selftests: Allow mapping guest memory without host alias
KVM: selftests: lib: Add src memory type for hwpoison test
KVM: selftests: hwpoison/mce failure injection

arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/mce.h | 16 +
arch/x86/kernel/cpu/mce/core.c | 56 ++
arch/x86/kernel/cpu/mce/inject.c | 91 ++-
arch/x86/kvm/debugfs.c | 22 +
arch/x86/kvm/x86.c | 14 +
include/linux/fs.h | 8 +
include/uapi/linux/fadvise.h | 5 +
mm/fadvise.c | 120 ++-
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 4 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 30 +-
tools/testing/selftests/kvm/lib/test_util.c | 8 +
.../testing/selftests/kvm/mem_hwpoison_test.c | 721 ++++++++++++++++++
15 files changed, 1066 insertions(+), 33 deletions(-)
create mode 100644 tools/testing/selftests/kvm/mem_hwpoison_test.c


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
--
2.25.1