Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory

From: Steven Price

Date: Mon Jun 08 2026 - 07:09:50 EST

On 05/06/2026 12:20, Gavin Shan wrote:
> Hi Steve,
>
> On 5/13/26 11:17 PM, Steven Price wrote:
>> At runtime if the realm guest accesses memory which hasn't yet been
>> mapped then KVM needs to either populate the region or fault the guest.
>>
>> For memory in the lower (protected) region of IPA a fresh page is
>> provided to the RMM which will zero the contents. For memory in the
>> upper (shared) region of IPA, the memory from the memslot is mapped
>> into the realm VM non secure.
>>
>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>> ---
>> Changes since v13:
>> * Numerous changes due to rebasing.
>> * Fix addr_range_desc() to encode the correct block size.
>> Changes since v12:
>> * Switch to RMM v2.0 range based APIs.
>> Changes since v11:
>> * Adapt to upstream changes.
>> Changes since v10:
>> * RME->RMI renaming.
>> * Adapt to upstream gmem changes.
>> Changes since v9:
>> * Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
>>     may_block to avoid stall warnings.
>> * Minor coding style fixes.
>> Changes since v8:
>> * Propagate the may_block flag.
>> * Minor comments and coding style changes.
>> Changes since v7:
>> * Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
>>     internally WARN when necessary.
>> Changes since v6:
>> * Handle PAGE_SIZE being larger than RMM granule size.
>> * Some minor renaming following review comments.
>> Changes since v5:
>> * Reduce use of struct page in preparation for supporting the RMM
>>     having a different page size to the host.
>> * Handle a race when delegating a page where another CPU has faulted on
>>     a the same page (and already delegated the physical page) but not yet
>>     mapped it. In this case simply return to the guest to either use the
>>     mapping from the other CPU (or refault if the race is lost).
>> * The changes to populate_par_region() are moved into the previous
>>     patch where they belong.
>> Changes since v4:
>> * Code cleanup following review feedback.
>> * Drop the PTE_SHARED bit when creating unprotected page table entries.
>>     This is now set by the RMM and the host has no control of it and the
>>     spec requires the bit to be set to zero.
>> Changes since v2:
>> * Avoid leaking memory if failing to map it in the realm.
>> * Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
>> * Adapt to changes in previous patches.
>> ---
>> arch/arm64/include/asm/kvm_emulate.h |   8 ++
>> arch/arm64/include/asm/kvm_rmi.h     | 12 ++
>> arch/arm64/kvm/mmu.c                 | 128 ++++++++++++++++----
>> arch/arm64/kvm/rmi.c                 | 173 +++++++++++++++++++++++++++
>> 4 files changed, 301 insertions(+), 20 deletions(-)
>>
>
> [...]
>
>> @@ -1604,27 +1641,52 @@ static int gmem_abort(const struct
>> kvm_s2_fault_desc *s2fd)
>>       bool write_fault, exec_fault;
>>       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>>       enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>> -    struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
>> +    struct kvm_vcpu *vcpu = s2fd->vcpu;
>> +    struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
>> +    gpa_t gpa = kvm_gpa_from_fault(vcpu->kvm, s2fd->fault_ipa);
>>       unsigned long mmu_seq;
>>       struct page *page;
>> -    struct kvm *kvm = s2fd->vcpu->kvm;
>> +    struct kvm *kvm = vcpu->kvm;
>>       void *memcache;
>>       kvm_pfn_t pfn;
>>       gfn_t gfn;
>>       int ret;
>> -    memcache = get_mmu_memcache(s2fd->vcpu);
>> -    ret = topup_mmu_memcache(s2fd->vcpu, memcache);
>> +    if (kvm_is_realm(vcpu->kvm)) {
>> +        /* check for memory attribute mismatch */
>> +        bool is_priv_gfn = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
>> +        /*
>> +         * For Realms, the shared address is an alias of the private
>> +         * PA with the top bit set. Thus if the fault address matches
>> +         * the GPA then it is the private alias.
>> +         */
>> +        bool is_priv_fault = (gpa == s2fd->fault_ipa);
>> +
>> +        if (is_priv_gfn != is_priv_fault) {
>> +            kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
>> +                              kvm_is_write_fault(vcpu),
>> +                              false,
>> +                              is_priv_fault);
>> +            /*
>> +             * KVM_EXIT_MEMORY_FAULT requires an return code of
>> +             * -EFAULT, see the API documentation
>> +             */
>> +            return -EFAULT;
>> +        }
>> +    }
>> +
>
> For a Realm, gmem_abort() is called by kvm_handle_guest_abort() only when
> we're faulting in the private (protected) space.
>
>     if (kvm_slot_has_gmem(memslot) && !shared_ipa_fault(vcpu->kvm,
> fault_ipa))
>         ret = gmem_abort(&s2fd);
>     else
>         ret = user_mem_abort(&s2fd);
>
> With the condition, this block of code can be simplied to handle conversion
> (shared -> private) instead of both directions.
>
>     /* Convert the shared address to the private adress for Realm */
>     if (kvm_is_realm(vcpu->kvm) &&
>         !kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT)) {
>         /*
>          * KVM_EXIT_MEMORY_FAULT requires an return code of
>          * -EFAULT, see the API documentation
>          */
>         kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
>                                       kvm_is_write_fault(vcpu),
>                                       false, true);
>         return -EFAULT;
>     }
>
>
> [...]
>
>> @@ -2396,7 +2475,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>                   !write_fault &&
>>                   !kvm_vcpu_trap_is_exec_fault(vcpu));
>> -        if (kvm_slot_has_gmem(memslot))
>> +        if (kvm_slot_has_gmem(memslot) && !shared_ipa_fault(vcpu-
>> >kvm, fault_ipa))
>>               ret = gmem_abort(&s2fd);
>>           else
>>               ret = user_mem_abort(&s2fd);
> gmem_abort() is only called for faults in the protected (private) space.

You're absolutely correct - that's a nice simplification!

Thanks,
Steve