Re: [PATCH v12 06/46] arm64: RMI: Define the user ABI

From: Steven Price

Date: Wed Mar 04 2026 - 07:08:53 EST

On 03/03/2026 13:11, Marc Zyngier wrote:
> On Mon, 02 Mar 2026 15:23:44 +0000,
> Steven Price <steven.price@xxxxxxx> wrote:
>>
>> Hi Marc,
>>
>> On 02/03/2026 14:25, Marc Zyngier wrote:
>>> On Wed, 17 Dec 2025 10:10:43 +0000,
>>> Steven Price <steven.price@xxxxxxx> wrote:
>>>>
>>>> There is one CAP which identified the presence of CCA, and two ioctls.
>>>> One ioctl is used to populate memory and the other is used when user
>>>> space is providing the PSCI implementation to identify the target of the
>>>> operation.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>>> ---
>>>> Changes since v11:
>>>> * Completely reworked to be more implicit. Rather than having explicit
>>>> CAP operations to progress the realm construction these operations
>>>> are done when needed (on populating and on first vCPU run).
>>>> * Populate and PSCI complete are promoted to proper ioctls.
>>>> Changes since v10:
>>>> * Rename symbols from RME to RMI.
>>>> Changes since v9:
>>>> * Improvements to documentation.
>>>> * Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
>>>> Changes since v8:
>>>> * Minor improvements to documentation following review.
>>>> * Bump the magic numbers to avoid conflicts.
>>>> Changes since v7:
>>>> * Add documentation of new ioctls
>>>> * Bump the magic numbers to avoid conflicts
>>>> Changes since v6:
>>>> * Rename some of the symbols to make their usage clearer and avoid
>>>> repetition.
>>>> Changes from v5:
>>>> * Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
>>>> KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
>>>> ---
>>>> Documentation/virt/kvm/api.rst | 57 ++++++++++++++++++++++++++++++++++
>>>> include/uapi/linux/kvm.h | 23 ++++++++++++++
>>>> 2 files changed, 80 insertions(+)
>>>>
>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>>> index 01a3abef8abb..2d5dc7e48954 100644
>>>> --- a/Documentation/virt/kvm/api.rst
>>>> +++ b/Documentation/virt/kvm/api.rst
>>>> @@ -6517,6 +6517,54 @@ the capability to be present.
>>>>
>>>> `flags` must currently be zero.
>>>>
>>>> +4.144 KVM_ARM_VCPU_RMI_PSCI_COMPLETE
>>>> +------------------------------------
>>>> +
>>>> +:Capability: KVM_CAP_ARM_RMI
>>>> +:Architectures: arm64
>>>> +:Type: vcpu ioctl
>>>> +:Parameters: struct kvm_arm_rmi_psci_complete (in)
>>>> +:Returns: 0 if successful, < 0 on error
>>>> +
>>>> +::
>>>> +
>>>> + struct kvm_arm_rmi_psci_complete {
>>>> + __u64 target_mpidr;
>>>> + __u32 psci_status;
>>>> + __u32 padding[3];
>>>> + };
>>>> +
>>>> +Where PSCI functions are handled by user space, the RMM needs to be informed of
>>>> +the target of the operation using `target_mpidr`, along with the status
>>>> +(`psci_status`). The RMM v1.0 specification defines two functions that require
>>>> +this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
>>>> +
>>>> +If the kernel is handling PSCI then this is done automatically and the VMM
>>>> +doesn't need to call this ioctl.
>>>
>>> Shouldn't we make handling of PSCI mandatory for VMMs that deal with
>>> CCA? I suspect it would simplify the implementation significantly.
>>
>> What do you mean by making it "mandatory for VMMs"? If you mean PSCI is
>> always forwarded to user space then I don't think it's going to make
>> much difference. Patch 27 handles the PSCI changes (72 lines added), and
>> some of that is adding this uAPI for the VMM to handle it.
>>
>> Removing the functionality to allow the VMM to handle it would obviously
>> simplify things a bit (we can drop this uAPI), but I think the desire is
>> to push this onto user space.
>
> And that's what I'm asking for. I do not want this to be optional. CCA
> should implies PSCI in userspace, and that's it.
>
>>
>>> What vcpu fd does this apply to? The vcpu calling the PSCI function?
>>> Or the target? This is pretty important for PSCI_ON. My guess is that
>>> this is setting the return value for the caller?
>>
>> Yes the fd is the vcpu calling PSCI. As you say, this is for the return
>> value to be set correctly.
>
> Another question is why do we need the ioctl at all? Why can't it be
> done on the first run of the target vcpu? If no PSCI call was issued
> to run it, then the run fails.

So my concern is the ordering of operations for PSCI_CPU_ON. As things
stand the RMM needs to know the MPIDR mapping to look up the REC object
before either VCPU runs again.

If we do this on the first run of the target VCPU, then how is the VMM
to tell that the target VCPU has executed "long enough" that it is safe
to do the return on the initial VCPU? Since the VCPUs are different
threads this becomes tricky. Options I can see are:

a) The VMM has to wait for the target VCPU to exit - we'd probably want
to trigger an artificial early exit in this case to unblock things.

b) The kernel blocks the initial VCPU from running until the target VCPU
has completed this "first run" logic. I think waiting in the kernel is
probably problematic, so this implies return some sort of "retry later"
response to the VMM.

c) The kernel handles the "PSCI_COMPLETE" dance on whichever VCPU runs
first, blocking the other until the dance is complete. A disadvantage
here is that behaviour can differ (in error conditions) depending on
which VCPU thread wins the race.

All these options also involve the kernel keeping track of the PSCI
sequence, in particular:

1. Tracking that the exit was due to a PSCI_CPU_ON.

2. Treating attempting to run the target VCPU as an implicit success
return from the PSCI call.

3. Recognising the next run on the initial VCPU as containing the PSCI
result - if 2, above, has happened then the kernel will need to handle
this (by killing the guest).

TLDR; Yes this is possible but I don't think it's pretty, and I'm not
convinced it's an improved uAPI.

Of course the above all assumes that the RMM can't just track things
internally. My preference is to kill RMI_PSCI_COMPLETE altogether, but
I'm not sure how possible that is within the context of the RMM.

>>
>>> Assuming this is indeed for the caller, why do we have a different
>>> flow from anything else that returns a result from a hypercall?
>>
>> I'm not entirely sure what you are suggesting. Do you mean why are we
>> not just writing to the GPRS that would contain the result? The issue
>> here is that the RMM needs to know the PA of the target REC structure -
>> this isn't a return to the guest, but information for the RMM itself to
>> complete the PSCI call.
>
> PSCI is a SMC call. SMC calls are routed to userspace as such. For odd
> reasons, the RMM treats PSCI differently from any other SMC call.
>
> That seems a very bizarre behaviour to me.

The RMM generally treats SMC specially. We have the RSI_HOST_CALL as a
proxy for "general SMC-like" behaviour which is forwarded to the VMM. I
believe the intention here is to ensure that SMCs (from the realm guest)
are handled by a trusted agent (i.e. the RMM). PSCI is a corner case
because it requires some coordination and buy-in from the VMM.

I'm not sure I fully understand the security pros and cons of the design
here and what impact it would have if PSCI was well trusted.

Thanks,
Steve

>>
>> Ultimately even in the case where the VMM is handling PSCI, it's
>> actually a combination of the VMM and the RMM - with the RMM validating
>> the responses.
>
> I don't see why PSCI is singled out here, irrespective of the tracking
> that the RMM wants to do.
>
> M.
>