Re: [PATCH v7 12/45] arm64: RME: Allocate/free RECs to match vCPUs

From: Gavin Shan
Date: Tue Apr 08 2025 - 00:56:15 EST


Hi Steven,

On 3/8/25 1:43 AM, Steven Price wrote:
On 03/03/2025 07:08, Gavin Shan wrote:
On 2/14/25 2:13 AM, Steven Price wrote:
The RMM maintains a data structure known as the Realm Execution Context
(or REC). It is similar to struct kvm_vcpu and tracks the state of the
virtual CPUs. KVM must delegate memory and request the structures are
created when vCPUs are created, and suitably tear down on destruction.

RECs must also be supplied with addition pages - auxiliary (or AUX)
granules - for storing the larger registers state (e.g. for SVE). The
number of AUX granules for a REC depends on the parameters with which
the Realm was created - the RMM makes this information available via the
RMI_REC_AUX_COUNT call performed after creating the Realm Descriptor
(RD).

Note that only some of register state for the REC can be set by KVM, the
rest is defined by the RMM (zeroed). The register state then cannot be
changed by KVM after the REC is created (except when the guest
explicitly requests this e.g. by performing a PSCI call). The RMM also
requires that the VMM creates RECs in ascending order of the MPIDR.

See Realm Management Monitor specification (DEN0137) for more
information:
https://developer.arm.com/documentation/den0137/

Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes since v6:
  * Avoid reporting the KVM_ARM_VCPU_REC feature if the guest isn't a
    realm guest.
  * Support host page size being larger than RMM's granule size when
    allocating/freeing aux granules.
Changes since v5:
  * Separate the concept of vcpu_is_rec() and
    kvm_arm_vcpu_rec_finalized() by using the KVM_ARM_VCPU_REC feature as
    the indication that the VCPU is a REC.
Changes since v2:
  * Free rec->run earlier in kvm_destroy_realm() and adapt to previous
patches.
---
  arch/arm64/include/asm/kvm_emulate.h |   7 ++
  arch/arm64/include/asm/kvm_host.h    |   3 +
  arch/arm64/include/asm/kvm_rme.h     |  18 +++
  arch/arm64/kvm/arm.c                 |  13 +-
  arch/arm64/kvm/reset.c               |  11 ++
  arch/arm64/kvm/rme.c                 | 179 +++++++++++++++++++++++++++
  6 files changed, 229 insertions(+), 2 deletions(-)


With the following one comment addressed:

Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx>

[...]

    /*
diff --git a/arch/arm64/include/asm/kvm_rme.h b/arch/arm64/include/
asm/kvm_rme.h
index 698bb48a8ae1..5db377943db4 100644
--- a/arch/arm64/include/asm/kvm_rme.h
+++ b/arch/arm64/include/asm/kvm_rme.h
@@ -6,6 +6,7 @@
  #ifndef __ASM_KVM_RME_H
  #define __ASM_KVM_RME_H
  +#include <asm/rmi_smc.h>
  #include <uapi/linux/kvm.h>
    /**
@@ -65,6 +66,21 @@ struct realm {
      unsigned int ia_bits;
  };
  +/**
+ * struct realm_rec - Additional per VCPU data for a Realm
+ *
+ * @mpidr: MPIDR (Multiprocessor Affinity Register) value to identify
this VCPU
+ * @rec_page: Kernel VA of the RMM's private page for this REC
+ * @aux_pages: Additional pages private to the RMM for this REC
+ * @run: Kernel VA of the RmiRecRun structure shared with the RMM
+ */
+struct realm_rec {
+    unsigned long mpidr;
+    void *rec_page;
+    struct page *aux_pages[REC_PARAMS_AUX_GRANULES];
+    struct rec_run *run;
+};
+

REC_PARAMS_AUX_GRANULES represents the maximal number of the auxiliary
granules.
Since the base page size is always larger than or equal to granule size
(4KB).
The capacity of array @aux_pages[] needs to be REC_PARAMS_AUX_GRANULES.
Ideally,
the array's size can be computed dynamically and it's allocated in
kvm_create_rec().

This is indeed another example of where pages and granules have got
mixed. The RMM specification provides a maximum number of granules (and
there's a similar array in struct rec_params). Here the use of
REC_PARAMS_AUX_GRANULES is just to keep the structure more simple (no
dynamic allocation) with the cost of ~128bytes. Obviously if
PAGE_SIZE>4k then this array could be smaller.


Exactly. The confusion is caused by the mismatched size between page and granule.
We're declaring an array of pages and the array's capacity is REC_PARAMS_AUX_GRANULES.
A comment is needed to explain for now. In long run, I think it's nice to allocate
them dynamically. The buddy allocator allocates 2^N pages and memory waste will be
existing without perfect alignment.

Alternatively, to keep the code simple, a comment is needed here to
explain why
the array's size has been set to REC_PARAMS_AUX_GRANULES.

Definitely a valid point - this could do with a comment explaining the
situation.


Ack.

An relevant question: Do we plan to support differentiated sizes between
page
and granule? I had the assumption this feature will be supported in the
future
after the base model (equal page and granule size) gets merged first.

Yes I do plan to support it. This series actually has the basic support
in it already, with an experimental patch at the end enabling that
support. It "works" but I haven't tested it well and I think some of the
error handling isn't quite right yet.

But there's also a bunch more work to be done (like here) to avoid
over-allocating memory when PAGE_SIZE>4k. E.g. RTTs need an
sub-allocator so that we don't waste an entire (larger) page on each RTT.


Ok. Frankly I didn't expect it when I reviewed the first patches of this
series. I thought it would be an incremental feature, meaning a separate
series to support it after the base series is accepted. Anyway, it's something
I need to review on the next respin (v8) if it's going to be supported
together with the base functionalities :-)

Thanks,
Gavin