Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory
From: Steven Price
Date: Mon Jun 08 2026 - 07:09:06 EST
On 08/06/2026 10:30, Suzuki K Poulose wrote:
> On 05/06/2026 07:23, Gavin Shan wrote:
>> Hi Steve,
>>
>> On 5/13/26 11:17 PM, Steven Price wrote:
>>> At runtime if the realm guest accesses memory which hasn't yet been
>>> mapped then KVM needs to either populate the region or fault the guest.
>>>
>>> For memory in the lower (protected) region of IPA a fresh page is
>>> provided to the RMM which will zero the contents. For memory in the
>>> upper (shared) region of IPA, the memory from the memslot is mapped
>>> into the realm VM non secure.
>>>
>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>> ---
>>> Changes since v13:
>>> * Numerous changes due to rebasing.
>>> * Fix addr_range_desc() to encode the correct block size.
>>> Changes since v12:
>>> * Switch to RMM v2.0 range based APIs.
>>> Changes since v11:
>>> * Adapt to upstream changes.
>>> Changes since v10:
>>> * RME->RMI renaming.
>>> * Adapt to upstream gmem changes.
>>> Changes since v9:
>>> * Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
>>> may_block to avoid stall warnings.
>>> * Minor coding style fixes.
>>> Changes since v8:
>>> * Propagate the may_block flag.
>>> * Minor comments and coding style changes.
>>> Changes since v7:
>>> * Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
>>> internally WARN when necessary.
>>> Changes since v6:
>>> * Handle PAGE_SIZE being larger than RMM granule size.
>>> * Some minor renaming following review comments.
>>> Changes since v5:
>>> * Reduce use of struct page in preparation for supporting the RMM
>>> having a different page size to the host.
>>> * Handle a race when delegating a page where another CPU has
>>> faulted on
>>> a the same page (and already delegated the physical page) but not
>>> yet
>>> mapped it. In this case simply return to the guest to either use the
>>> mapping from the other CPU (or refault if the race is lost).
>>> * The changes to populate_par_region() are moved into the previous
>>> patch where they belong.
>>> Changes since v4:
>>> * Code cleanup following review feedback.
>>> * Drop the PTE_SHARED bit when creating unprotected page table
>>> entries.
>>> This is now set by the RMM and the host has no control of it and the
>>> spec requires the bit to be set to zero.
>>> Changes since v2:
>>> * Avoid leaking memory if failing to map it in the realm.
>>> * Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
>>> * Adapt to changes in previous patches.
>>> ---
>>> arch/arm64/include/asm/kvm_emulate.h | 8 ++
>>> arch/arm64/include/asm/kvm_rmi.h | 12 ++
>>> arch/arm64/kvm/mmu.c | 128 ++++++++++++++++----
>>> arch/arm64/kvm/rmi.c | 173 +++++++++++++++++++++++++++
>>> 4 files changed, 301 insertions(+), 20 deletions(-)
>>>
[...]
>>> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
>>> index cae29fd3353c..761b38a4071c 100644
>>> --- a/arch/arm64/kvm/rmi.c
>>> +++ b/arch/arm64/kvm/rmi.c
>>> @@ -597,6 +597,179 @@ static int realm_data_map_init(struct kvm *kvm,
>>> unsigned long ipa,
>>> return ret;
>>> }
>>> +static unsigned long addr_range_desc(unsigned long phys, unsigned
>>> long size)
>>> +{
>>> + unsigned long out = 0;
>>> +
>>> + switch (size) {
>>> + case P4D_SIZE:
>>> + out = 3 | (1 << 2);
>>> + break;
>>> + case PUD_SIZE:
>>> + out = 2 | (1 << 2);
>>> + break;
>>> + case PMD_SIZE:
>>> + out = 1 | (1 << 2);
>>> + break;
>>> + case PAGE_SIZE:
>>> + out = 0 | (1 << 2);
>>> + break;
>>> + default:
>>> + /*
>>> + * Only support mapping at the page level granulatity when
>>> + * it's an unusual length. This should get us back onto a
>>> larger
>>> + * block size for the subsequent mappings.
>>> + */
>>> + out = 0 | ((MIN(size >> PAGE_SHIFT, PTRS_PER_PTE - 1)) << 2);
>>> + break;
>>> + }
>>> +
>>> + WARN_ON(phys & ~PAGE_MASK);
>>> +
>>> + out |= phys & PAGE_MASK;
>>> +
>>> + return out;
>>> +}
>>> +
>>> +int realm_map_protected(struct kvm *kvm,
>>> + unsigned long ipa,
>>> + kvm_pfn_t pfn,
>>> + unsigned long map_size,
>>> + struct kvm_mmu_memory_cache *memcache)
>>> +{
>>> + struct realm *realm = &kvm->arch.realm;
>>> + phys_addr_t phys = __pfn_to_phys(pfn);
>>> + phys_addr_t base_phys = phys;
>>> + phys_addr_t rd = virt_to_phys(realm->rd);
>>> + unsigned long base_ipa = ipa;
>>> + unsigned long ipa_top = ipa + map_size;
>>> + int ret = 0;
>>> +
>>> + if (WARN_ON(!IS_ALIGNED(map_size, PAGE_SIZE) ||
>>> + !IS_ALIGNED(ipa, map_size)))
>>> + return -EINVAL;
>>> +
>>> + if (rmi_delegate_range(phys, map_size)) {
>>> + /*
>>> + * It's likely we raced with another VCPU on the same
>>> + * fault. Assume the other VCPU has handled the fault
>>> + * and return to the guest.
>>> + */
>>> + return 0;
>>> + }
>>> +
>>> + while (ipa < ipa_top) {
>>> + unsigned long flags = RMI_ADDR_TYPE_SINGLE;
>>> + unsigned long range_desc = addr_range_desc(phys, ipa_top -
>>> ipa);
>>> + unsigned long out_top;
>>> +
>>> + ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags, range_desc,
>>> + &out_top);
>>> +
>>> + if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
>>> + /* Create missing RTTs and retry */
>>> + int level = RMI_RETURN_INDEX(ret);
>>> +
>>> + WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
>>> + ret = realm_create_rtt_levels(realm, ipa, level,
>>> + KVM_PGTABLE_LAST_LEVEL,
>>> + memcache);
>
> Could we give the RMM a chance to make use of the Block mappings by
> creating the Missing RTTs to the level that may work for the current
> range_desc ? i.e., if the range_desc is a 2M block size, we could create
> tables upto L2 in the first go and if the RMM still needs RTT, we could
> go further down to the KVM_PGTABLE_LAST_LEVEL. I understand this is
> kind of an optimisation, so may be we could defer it. (Same applies for
> the non_secure map below).
A simple change would be just to create one level at a time like this:
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index b79b96f7dffb..3f3ade1d3895 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -767,15 +767,15 @@ static int realm_map_protected(struct kvm *kvm,
/* Create missing RTTs and retry */
int level = RMI_RETURN_INDEX(ret);
- WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+ if (WARN_ON(level >= KVM_PGTABLE_LAST_LEVEL))
+ goto err_undelegate;
ret = realm_create_rtt_levels(realm, ipa, level,
- KVM_PGTABLE_LAST_LEVEL,
+ level + 1,
memcache);
if (ret)
goto err_undelegate;
- ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags,
- range_desc, &out_top);
+ continue;
}
if (WARN_ON(ret))
Thanks,
Steve