[RFC PATCH 0/7] support for mm-local memory allocations and use it

From: Fares Mehanna
Date: Wed Sep 11 2024 - 10:38:17 EST


In a series posted a few years ago [1], a proposal was put forward to allow the
kernel to allocate memory local to a mm and thus push it out of reach for
current and future speculation-based cross-process attacks. We still believe
this is a nice thing to have.

However, in the time passed since that post Linux mm has grown quite a few new
goodies, so we'd like to explore possibilities to implement this functionality
with less effort and churn leveraging the now available facilities.

An RFC was posted few months back [2] to show the proof of concept and a simple
test driver.

In this RFC, we're using the same approach of implementing mm-local allocations
piggy-backing on memfd_secret(), using regular user addresses but pinning the
pages and flipping the user/supervisor flag on the respective PTEs to make them
directly accessible from kernel.
In addition to that we are submitting 5 patches to use the secret memory to hide
the vCPU gp-regs and fp-regs on arm64 VHE systems.

The generic drawbacks of using user virtual addresses mentioned in the previous
RFC [2] still hold, in addition to a more specific one:

- While the user virtual addresses allocated for kernel secret memory are not
directly accessible by userspace as the PTEs restrict that, copy_from_user()
and copy_to_user() can operate on those ranges, so that e.g. the usermode can
guess the address and pass it as the target buffer for read(), making the
kernel overwrite it with the user-controlled content. Effectively making the
secret memory in the current implementation missing confidentiality and
integrity guarantees.

In the specific case of vCPU registers, this is fine because the owner process
can read and write to them using KVM IOCTLs anyway. But in the general case this
represents a security concern and needs to be addressed.

A possible way forward for the arch-agnostic implementation is to limit the user
virtual addresses used for kernel to specific range that can be checked against
in copy_from_user() and copy_to_user().

For arch specific implementation, using separate PGD is the way to go.

[1] https://lore.kernel.org/lkml/20190612170834.14855-1-mhillenb@xxxxxxxxx/
[2] https://lore.kernel.org/lkml/20240621201501.1059948-1-rkagan@xxxxxxxxx/

Fares Mehanna / Roman Kagan (2):
mseal: expose interface to seal / unseal user memory ranges
mm/secretmem: implement mm-local kernel allocations

Fares Mehanna (5):
arm64: KVM: Refactor C-code to access vCPU gp-registers through macros
KVM: Refactor Assembly-code to access vCPU gp-registers through a
macro
arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and
KERNEL_SECRETMEM enabled systems
arm64: KVM: Refactor C-code to access vCPU fp-registers through macros
arm64: KVM: Allocate vCPU fp-regs dynamically on VHE and
KERNEL_SECRETMEM enabled systems

arch/arm64/include/asm/kvm_asm.h | 50 ++--
arch/arm64/include/asm/kvm_emulate.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 41 +++-
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/image-vars.h | 2 +
arch/arm64/kvm/arm.c | 90 +++++++-
arch/arm64/kvm/fpsimd.c | 2 +-
arch/arm64/kvm/guest.c | 14 +-
arch/arm64/kvm/hyp/entry.S | 15 ++
arch/arm64/kvm/hyp/include/hyp/switch.h | 6 +-
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 +-
.../arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +-
arch/arm64/kvm/hyp/nvhe/host.S | 20 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 +-
arch/arm64/kvm/reset.c | 2 +-
arch/arm64/kvm/va_layout.c | 38 ++++
include/linux/secretmem.h | 29 +++
mm/Kconfig | 10 +
mm/gup.c | 4 +-
mm/internal.h | 7 +
mm/mseal.c | 81 ++++---
mm/secretmem.c | 213 ++++++++++++++++++
22 files changed, 559 insertions(+), 84 deletions(-)

--
2.40.1




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597