Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory

From: Gavin Shan

Date: Fri Jun 05 2026 - 07:30:25 EST

Hi Steve,

On 5/13/26 11:17 PM, Steven Price wrote:

At runtime if the realm guest accesses memory which hasn't yet been
mapped then KVM needs to either populate the region or fault the guest.

For memory in the lower (protected) region of IPA a fresh page is
provided to the RMM which will zero the contents. For memory in the
upper (shared) region of IPA, the memory from the memslot is mapped
into the realm VM non secure.

Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes since v13:
* Numerous changes due to rebasing.
* Fix addr_range_desc() to encode the correct block size.
Changes since v12:
* Switch to RMM v2.0 range based APIs.
Changes since v11:
* Adapt to upstream changes.
Changes since v10:
* RME->RMI renaming.
* Adapt to upstream gmem changes.
Changes since v9:
* Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
may_block to avoid stall warnings.
* Minor coding style fixes.
Changes since v8:
* Propagate the may_block flag.
* Minor comments and coding style changes.
Changes since v7:
* Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
internally WARN when necessary.
Changes since v6:
* Handle PAGE_SIZE being larger than RMM granule size.
* Some minor renaming following review comments.
Changes since v5:
* Reduce use of struct page in preparation for supporting the RMM
having a different page size to the host.
* Handle a race when delegating a page where another CPU has faulted on
a the same page (and already delegated the physical page) but not yet
mapped it. In this case simply return to the guest to either use the
mapping from the other CPU (or refault if the race is lost).
* The changes to populate_par_region() are moved into the previous
patch where they belong.
Changes since v4:
* Code cleanup following review feedback.
* Drop the PTE_SHARED bit when creating unprotected page table entries.
This is now set by the RMM and the host has no control of it and the
spec requires the bit to be set to zero.
Changes since v2:
* Avoid leaking memory if failing to map it in the realm.
* Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
* Adapt to changes in previous patches.
---
arch/arm64/include/asm/kvm_emulate.h | 8 ++
arch/arm64/include/asm/kvm_rmi.h | 12 ++
arch/arm64/kvm/mmu.c | 128 ++++++++++++++++----
arch/arm64/kvm/rmi.c | 173 +++++++++++++++++++++++++++
4 files changed, 301 insertions(+), 20 deletions(-)

[...]

@@ -1604,27 +1641,52 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
bool write_fault, exec_fault;
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
- struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
+ struct kvm_vcpu *vcpu = s2fd->vcpu;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ gpa_t gpa = kvm_gpa_from_fault(vcpu->kvm, s2fd->fault_ipa);
unsigned long mmu_seq;
struct page *page;
- struct kvm *kvm = s2fd->vcpu->kvm;
+ struct kvm *kvm = vcpu->kvm;
void *memcache;
kvm_pfn_t pfn;
gfn_t gfn;
int ret;
- memcache = get_mmu_memcache(s2fd->vcpu);
- ret = topup_mmu_memcache(s2fd->vcpu, memcache);
+ if (kvm_is_realm(vcpu->kvm)) {
+ /* check for memory attribute mismatch */
+ bool is_priv_gfn = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+ /*
+ * For Realms, the shared address is an alias of the private
+ * PA with the top bit set. Thus if the fault address matches
+ * the GPA then it is the private alias.
+ */
+ bool is_priv_fault = (gpa == s2fd->fault_ipa);
+
+ if (is_priv_gfn != is_priv_fault) {
+ kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
+ kvm_is_write_fault(vcpu),
+ false,
+ is_priv_fault);
+ /*
+ * KVM_EXIT_MEMORY_FAULT requires an return code of
+ * -EFAULT, see the API documentation
+ */
+ return -EFAULT;
+ }
+ }
+

For a Realm, gmem_abort() is called by kvm_handle_guest_abort() only when
we're faulting in the private (protected) space.

if (kvm_slot_has_gmem(memslot) && !shared_ipa_fault(vcpu->kvm, fault_ipa))
ret = gmem_abort(&s2fd);
else
ret = user_mem_abort(&s2fd);

With the condition, this block of code can be simplied to handle conversion
(shared -> private) instead of both directions.

/* Convert the shared address to the private adress for Realm */
if (kvm_is_realm(vcpu->kvm) &&
!kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT)) {
/*
* KVM_EXIT_MEMORY_FAULT requires an return code of
* -EFAULT, see the API documentation
*/
kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
kvm_is_write_fault(vcpu),
false, true);
return -EFAULT;
}

[...]

@@ -2396,7 +2475,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
!write_fault &&
!kvm_vcpu_trap_is_exec_fault(vcpu));
- if (kvm_slot_has_gmem(memslot))
+ if (kvm_slot_has_gmem(memslot) && !shared_ipa_fault(vcpu->kvm, fault_ipa))
ret = gmem_abort(&s2fd);
else
ret = user_mem_abort(&s2fd);

gmem_abort() is only called for faults in the protected (private) space.

Thanks,
Gavin