Re: [PATCH v12 06/46] arm64: RMI: Define the user ABI

From: Suzuki K Poulose

Date: Tue Mar 03 2026 - 09:28:44 EST

On 03/03/2026 13:13, Marc Zyngier wrote:

On Mon, 02 Mar 2026 17:13:41 +0000,
Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:

On 02/03/2026 15:23, Steven Price wrote:

Hi Marc,

On 02/03/2026 14:25, Marc Zyngier wrote:

On Wed, 17 Dec 2025 10:10:43 +0000,
Steven Price <steven.price@xxxxxxx> wrote:

There is one CAP which identified the presence of CCA, and two ioctls.
One ioctl is used to populate memory and the other is used when user
space is providing the PSCI implementation to identify the target of the
operation.

Signed-off-by: Steven Price <steven.price@xxxxxxx>
---
Changes since v11:
* Completely reworked to be more implicit. Rather than having explicit
CAP operations to progress the realm construction these operations
are done when needed (on populating and on first vCPU run).
* Populate and PSCI complete are promoted to proper ioctls.
Changes since v10:
* Rename symbols from RME to RMI.
Changes since v9:
* Improvements to documentation.
* Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
Changes since v8:
* Minor improvements to documentation following review.
* Bump the magic numbers to avoid conflicts.
Changes since v7:
* Add documentation of new ioctls
* Bump the magic numbers to avoid conflicts
Changes since v6:
* Rename some of the symbols to make their usage clearer and avoid
repetition.
Changes from v5:
* Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
---
Documentation/virt/kvm/api.rst | 57 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 23 ++++++++++++++
2 files changed, 80 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 01a3abef8abb..2d5dc7e48954 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6517,6 +6517,54 @@ the capability to be present.
`flags` must currently be zero.
+4.144 KVM_ARM_VCPU_RMI_PSCI_COMPLETE
+------------------------------------
+
+:Capability: KVM_CAP_ARM_RMI
+:Architectures: arm64
+:Type: vcpu ioctl
+:Parameters: struct kvm_arm_rmi_psci_complete (in)
+:Returns: 0 if successful, < 0 on error
+
+::
+
+ struct kvm_arm_rmi_psci_complete {
+ __u64 target_mpidr;
+ __u32 psci_status;
+ __u32 padding[3];
+ };
+
+Where PSCI functions are handled by user space, the RMM needs to be informed of
+the target of the operation using `target_mpidr`, along with the status
+(`psci_status`). The RMM v1.0 specification defines two functions that require
+this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
+
+If the kernel is handling PSCI then this is done automatically and the VMM
+doesn't need to call this ioctl.

Shouldn't we make handling of PSCI mandatory for VMMs that deal with
CCA? I suspect it would simplify the implementation significantly.

What do you mean by making it "mandatory for VMMs"? If you mean PSCI is
always forwarded to user space then I don't think it's going to make
much difference. Patch 27 handles the PSCI changes (72 lines added), and
some of that is adding this uAPI for the VMM to handle it.

Removing the functionality to allow the VMM to handle it would obviously
simplify things a bit (we can drop this uAPI), but I think the desire is
to push this onto user space.

What vcpu fd does this apply to? The vcpu calling the PSCI function?
Or the target? This is pretty important for PSCI_ON. My guess is that
this is setting the return value for the caller?

Yes the fd is the vcpu calling PSCI. As you say, this is for the return
value to be set correctly.

Assuming this is indeed for the caller, why do we have a different
flow from anything else that returns a result from a hypercall?

I'm not entirely sure what you are suggesting. Do you mean why are we
not just writing to the GPRS that would contain the result? The issue
here is that the RMM needs to know the PA of the target REC structure -
this isn't a return to the guest, but information for the RMM itself to
complete the PSCI call.

Ultimately even in the case where the VMM is handling PSCI, it's
actually a combination of the VMM and the RMM - with the RMM validating
the responses.

More importantly, we have to make sure that the "RMI_PSCI_COMPLETE" is
invoked before both of the following:
1. The "source" vCPU is run again
2. More importantly the "target" vCPU is run.

I don't understand why (1) is required. Once the VMM gets the request,

The underlying issue is, the RMM doesn't have the VCPU object for the
"target" VCPU, to make the book keeping. Also, please note that for a
Realm, PSCI is emulated by the "RMM". Host is obviously notified of the
"PSCI" changes via EXIT_PSCI (note, it is not SMCCC exit)
so that it can be in sync with the real state. And does have a say in
CPU_ON. So, before we return to running the "source" CPU,
Host must provide the target VCPU object and its consent (via
psci_status) to the RMM. This allows the RMM to emulate the PSCI
request correctly and also at the same time keep its book keeping
in tact (i.e., marking the Target VCPU as runnable or not).

When a "source" VCPU exits to the host with a PSCI_EXIT, the RMM
marks the source VCPU has a pending PSCI operation, and
RMI_PSCI_COMPLETE request ticks that off, making it runnable again.

Suzuki

the target vcpu can run, and can itself do the completion, without any
additional userspace involvement.

M.