Re: [PATCH v8 15/43] arm64: RME: Allow VMM to set RIPAS

From: Gavin Shan
Date: Thu May 01 2025 - 19:59:50 EST


On 5/2/25 2:00 AM, Steven Price wrote:
On 30/04/2025 12:38, Gavin Shan wrote:
On 4/16/25 11:41 PM, Steven Price wrote:
Each page within the protected region of the realm guest can be marked
as either RAM or EMPTY. Allow the VMM to control this before the guest
has started and provide the equivalent functions to change this (with
the guest's approval) at runtime.

When transitioning from RIPAS RAM (1) to RIPAS EMPTY (0) the memory is
unmapped from the guest and undelegated allowing the memory to be reused
by the host. When transitioning to RIPAS RAM the actual population of
the leaf RTTs is done later on stage 2 fault, however it may be
necessary to allocate additional RTTs to allow the RMM track the RIPAS
for the requested range.

When freeing a block mapping it is necessary to temporarily unfold the
RTT which requires delegating an extra page to the RMM, this page can
then be recovered once the contents of the block mapping have been
freed.

Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes from v7:
  * Replace use of "only_shared" with the upstream "attr_filter" field
    of struct kvm_gfn_range.
  * Clean up the logic in alloc_delegated_granule() for when to call
    kvm_account_pgtable_pages().
  * Rename realm_destroy_protected_granule() to
    realm_destroy_private_granule() to match the naming elsewhere. Also
    fix the return codes in the function to be descriptive.
  * Several other minor changes to names/return codes.
Changes from v6:
  * Split the code dealing with the guest triggering a RIPAS change into
    a separate patch, so this patch is purely for the VMM setting up the
    RIPAS before the guest first runs.
  * Drop the useless flags argument from alloc_delegated_granule().
  * Account RTTs allocated for a guest using kvm_account_pgtable_pages().
  * Deal with the RMM granule size potentially being smaller than the
    host's PAGE_SIZE. Although note alloc_delegated_granule() currently
    still allocates an entire host page for every RMM granule (so wasting
    memory when PAGE_SIZE>4k).
Changes from v5:
  * Adapt to rebasing.
  * Introduce find_map_level()
  * Rename some functions to be clearer.
  * Drop the "spare page" functionality.
Changes from v2:
  * {alloc,free}_delegated_page() moved from previous patch to this one.
  * alloc_delegated_page() now takes a gfp_t flags parameter.
  * Fix the reference counting of guestmem pages to avoid leaking memory.
  * Several misc code improvements and extra comments.
---
  arch/arm64/include/asm/kvm_rme.h |   5 +
  arch/arm64/kvm/mmu.c             |   8 +-
  arch/arm64/kvm/rme.c             | 384 +++++++++++++++++++++++++++++++
  3 files changed, 394 insertions(+), 3 deletions(-)


.../...

+static int kvm_init_ipa_range_realm(struct kvm *kvm,
+                    struct arm_rme_init_ripas *args)
+{
+    gpa_t addr, end;
+    struct realm *realm = &kvm->arch.realm;
+
+    addr = args->base;
+    end = addr + args->size;
+
+    if (end < addr)
+        return -EINVAL;

The check needs to cover 'end <= addr'. RMI_ERROR_INPUT is returned from
RMM::smc_rtt_init_ripas()
if 'end' is equal to 'addr', but we're returning 0, inconsistent to that.

I agree we're different to smc_rtt_init_ripas(), but I don't really see
why we should prevent args->size==0. Calling the low level SMC in that
case would clearly be wrong (the kernel should be validating and that
would show a lack of validation), but we handle that with the while loop
in realm_init_ipa_state().

Do you think it's important to define this uAPI to disallow size==0?


No, it's not a big deal since it's just a nitpick. The current implementation
isn't wrong because 0 will be returned when size is 0, meaning it succeeds
but no work to do there. Please leave the code as of being :)

    if (end <= addr)
        return -EINVAL;

+
+    if (kvm_realm_state(kvm) != REALM_STATE_NEW)
+        return -EPERM;

To keep the consistency, kvm_realm_is_created() can be used here.

    if (kvm_realm_is_created(kvm))
        return -EPERM;

This isn't the same - kvm_realm_is_create() is checking for
REALM_STATE_NONE, but we want to check for REALM_STATE_NEW.


Hmm, sorry for the noise. The current code is good enough then :)

Thanks,
Gavin