Re: [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory

From: Suzuki K Poulose

Date: Mon Jun 08 2026 - 08:59:53 EST


On 08/06/2026 11:56, Steven Price wrote:
On 08/06/2026 10:30, Suzuki K Poulose wrote:
On 05/06/2026 07:23, Gavin Shan wrote:
Hi Steve,

On 5/13/26 11:17 PM, Steven Price wrote:
At runtime if the realm guest accesses memory which hasn't yet been
mapped then KVM needs to either populate the region or fault the guest.

For memory in the lower (protected) region of IPA a fresh page is
provided to the RMM which will zero the contents. For memory in the
upper (shared) region of IPA, the memory from the memslot is mapped
into the realm VM non secure.

Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes since v13:
  * Numerous changes due to rebasing.
  * Fix addr_range_desc() to encode the correct block size.
Changes since v12:
  * Switch to RMM v2.0 range based APIs.
Changes since v11:
  * Adapt to upstream changes.
Changes since v10:
  * RME->RMI renaming.
  * Adapt to upstream gmem changes.
Changes since v9:
  * Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
    may_block to avoid stall warnings.
  * Minor coding style fixes.
Changes since v8:
  * Propagate the may_block flag.
  * Minor comments and coding style changes.
Changes since v7:
  * Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
    internally WARN when necessary.
Changes since v6:
  * Handle PAGE_SIZE being larger than RMM granule size.
  * Some minor renaming following review comments.
Changes since v5:
  * Reduce use of struct page in preparation for supporting the RMM
    having a different page size to the host.
  * Handle a race when delegating a page where another CPU has
faulted on
    a the same page (and already delegated the physical page) but not
yet
    mapped it. In this case simply return to the guest to either use the
    mapping from the other CPU (or refault if the race is lost).
  * The changes to populate_par_region() are moved into the previous
    patch where they belong.
Changes since v4:
  * Code cleanup following review feedback.
  * Drop the PTE_SHARED bit when creating unprotected page table
entries.
    This is now set by the RMM and the host has no control of it and the
    spec requires the bit to be set to zero.
Changes since v2:
  * Avoid leaking memory if failing to map it in the realm.
  * Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
  * Adapt to changes in previous patches.
---
  arch/arm64/include/asm/kvm_emulate.h |   8 ++
  arch/arm64/include/asm/kvm_rmi.h     |  12 ++
  arch/arm64/kvm/mmu.c                 | 128 ++++++++++++++++----
  arch/arm64/kvm/rmi.c                 | 173 +++++++++++++++++++++++++++
  4 files changed, 301 insertions(+), 20 deletions(-)


[...]

diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index cae29fd3353c..761b38a4071c 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -597,6 +597,179 @@ static int realm_data_map_init(struct kvm *kvm,
unsigned long ipa,
      return ret;
  }
+static unsigned long addr_range_desc(unsigned long phys, unsigned
long size)
+{
+    unsigned long out = 0;
+
+    switch (size) {
+    case P4D_SIZE:
+        out = 3 | (1 << 2);
+        break;
+    case PUD_SIZE:
+        out = 2 | (1 << 2);
+        break;
+    case PMD_SIZE:
+        out = 1 | (1 << 2);
+        break;
+    case PAGE_SIZE:
+        out = 0 | (1 << 2);
+        break;
+    default:
+        /*
+         * Only support mapping at the page level granulatity when
+         * it's an unusual length. This should get us back onto a
larger
+         * block size for the subsequent mappings.
+         */
+        out = 0 | ((MIN(size >> PAGE_SHIFT, PTRS_PER_PTE - 1)) << 2);
+        break;
+    }
+
+    WARN_ON(phys & ~PAGE_MASK);
+
+    out |= phys & PAGE_MASK;
+
+    return out;
+}
+
+int realm_map_protected(struct kvm *kvm,
+            unsigned long ipa,
+            kvm_pfn_t pfn,
+            unsigned long map_size,
+            struct kvm_mmu_memory_cache *memcache)
+{
+    struct realm *realm = &kvm->arch.realm;
+    phys_addr_t phys = __pfn_to_phys(pfn);
+    phys_addr_t base_phys = phys;
+    phys_addr_t rd = virt_to_phys(realm->rd);
+    unsigned long base_ipa = ipa;
+    unsigned long ipa_top = ipa + map_size;
+    int ret = 0;
+
+    if (WARN_ON(!IS_ALIGNED(map_size, PAGE_SIZE) ||
+            !IS_ALIGNED(ipa, map_size)))
+        return -EINVAL;
+
+    if (rmi_delegate_range(phys, map_size)) {
+        /*
+         * It's likely we raced with another VCPU on the same
+         * fault. Assume the other VCPU has handled the fault
+         * and return to the guest.
+         */
+        return 0;
+    }
+
+    while (ipa < ipa_top) {
+        unsigned long flags = RMI_ADDR_TYPE_SINGLE;
+        unsigned long range_desc = addr_range_desc(phys, ipa_top -
ipa);
+        unsigned long out_top;
+
+        ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags, range_desc,
+                       &out_top);
+
+        if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+            /* Create missing RTTs and retry */
+            int level = RMI_RETURN_INDEX(ret);
+
+            WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+            ret = realm_create_rtt_levels(realm, ipa, level,
+                              KVM_PGTABLE_LAST_LEVEL,
+                              memcache);

Could we give the RMM a chance to make use of the Block mappings by
creating the Missing RTTs to the level that may work for the current
range_desc ? i.e., if the range_desc is a 2M block size, we could create
tables upto L2 in the first go and if the RMM still needs RTT, we could
go further down to the KVM_PGTABLE_LAST_LEVEL. I understand this is
kind of an optimisation, so may be we could defer it. (Same applies for
the non_secure map below).

A simple change would be just to create one level at a time like this:

diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index b79b96f7dffb..3f3ade1d3895 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -767,15 +767,15 @@ static int realm_map_protected(struct kvm *kvm,
/* Create missing RTTs and retry */
int level = RMI_RETURN_INDEX(ret);
- WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+ if (WARN_ON(level >= KVM_PGTABLE_LAST_LEVEL))
+ goto err_undelegate;
ret = realm_create_rtt_levels(realm, ipa, level,
- KVM_PGTABLE_LAST_LEVEL,
+ level + 1,
memcache);
if (ret)
goto err_undelegate;
- ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags,
- range_desc, &out_top);
+ continue;
}

That looks good to me.

Cheers
Suzuki


if (WARN_ON(ret))

Thanks,
Steve