Re: [PATCH v14 28/44] arm64: RMI: Create the realm descriptor

From: Steven Price

Date: Mon Jun 08 2026 - 06:15:10 EST


On 28/05/2026 06:51, Gavin Shan wrote:
> Hi Steve,
>
> On 5/13/26 11:17 PM, Steven Price wrote:
>> Creating a realm involves first creating a realm descriptor (RD). This
>> involves passing the configuration information to the RMM. Do this as
>> part of realm_ensure_created() so that the realm is created when it is
>> first needed.
>>
>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>> ---
>> Changes since v13:
>>   * The RMM no longer uses AUX granules, so no need to ask it how many it
>>     needs.
>>   * Adapted to other changes.
>> Changes since v12:
>>   * Since RMM page size is now equal to the host's page size various
>>     calculations are simplified.
>>   * Switch to using range based APIs to delegate/undelegate.
>>   * VMID handling is now handled entirely by the RMM.
>> ---
>>   arch/arm64/kvm/rmi.c | 88 +++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 86 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
>> index fb96bcaa73ed..cae29fd3353c 100644
>> --- a/arch/arm64/kvm/rmi.c
>> +++ b/arch/arm64/kvm/rmi.c
>> @@ -418,6 +418,77 @@ static void realm_unmap_shared_range(struct kvm
>> *kvm,
>>                    start, end);
>>   }
>>   +static int realm_create_rd(struct kvm *kvm)
>> +{
>> +    struct realm *realm = &kvm->arch.realm;
>> +    struct realm_params *params = realm->params;
>> +    void *rd = NULL;
>> +    phys_addr_t rd_phys, params_phys;
>> +    size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
>> +    int r;
>> +
>> +    realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
>> +
>> +    if (WARN_ON(realm->rd || !realm->params))
>> +        return -EEXIST;
>> +
>> +    rd = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
>> +    if (!rd)
>> +        return -ENOMEM;
>> +
>> +    rd_phys = virt_to_phys(rd);
>> +    if (rmi_delegate_page(rd_phys)) {
>> +        r = -ENXIO;
>> +        goto free_rd;
>> +    }
>> +
>> +    if (rmi_delegate_range(kvm->arch.mmu.pgd_phys, pgd_size)) {
>> +        r = -ENXIO;
>> +        goto out_undelegate_tables;
>> +    }
>> +
>> +    params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
>> +    params->rtt_level_start = get_start_level(realm);
>> +    params->rtt_num_start = pgd_size / PAGE_SIZE;
>> +    params->rtt_base = kvm->arch.mmu.pgd_phys;
>> +
>> +    if (kvm->arch.arm_pmu) {
>> +        params->pmu_num_ctrs = kvm->arch.nr_pmu_counters;
>> +        params->flags |= RMI_REALM_PARAM_FLAG_PMU;
>> +    }
>> +
>> +    if (kvm_lpa2_is_enabled())
>> +        params->flags |= RMI_REALM_PARAM_FLAG_LPA2;
>> +
>> +    params_phys = virt_to_phys(params);
>> +
>> +    if (rmi_realm_create(rd_phys, params_phys)) {
>> +        r = -ENXIO;
>> +        goto out_undelegate_tables;
>> +    }
>> +
>> +    realm->rd = rd;
>> +    kvm_set_realm_state(kvm, REALM_STATE_NEW);
>> +    /* The realm is up, free the parameters.  */
>> +    free_page((unsigned long)realm->params);
>> +    realm->params = NULL;
>> +
>> +    return 0;
>> +
>> +out_undelegate_tables:
>> +    if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys,
>> pgd_size))) {
>> +        /* Leak the pages if they cannot be returned */
>> +        kvm->arch.mmu.pgt = NULL;
>> +    }
>
> In the latest RMM implementation (topics/rmm-v2.0-poc_2),
> rmi_delegate_range() works
> with the granularity of granule (4KB) and it can fail on any granule.
> For example,
> we have 16x granule as the root RTT and rmi_delegate_range() fails on
> the first
> granule, we're going to undelegate all these 16x granules, which were
> never delegated
> to RMM. It eventually leads to error and memory leakage.
>
> For this, rmi_delegate_range() could be improved to return the number of
> granules that
> have been delegated. The return value can be used by the caller to
> handle the erroneous
> case by passing the correct range to rmi_undelegate_page().

Well spotted - yes the current situation where the entire region is
leaked if the delegate only partially completes is less than ideal! I'll
add a third argument to rmi_delegate_range() to return the top of the
region that was successfully delegated. The caller can then attempt an
undelegate on just the range which was delegated.

Thanks,
Steve

>> +    if (WARN_ON(rmi_undelegate_page(rd_phys))) {
>> +        /* Leak the page if it isn't returned */
>> +        return r;
>> +    }
>> +free_rd:
>> +    free_page((unsigned long)rd);
>> +    return r;
>> +}
>> +
>>   static void realm_unmap_private_range(struct kvm *kvm,
>>                         unsigned long start,
>>                         unsigned long end,
>> @@ -647,8 +718,21 @@ static int realm_init_ipa_state(struct kvm *kvm,
>>     static int realm_ensure_created(struct kvm *kvm)
>>   {
>> -    /* Provided in later patch */
>> -    return -ENXIO;
>> +    int ret;
>> +
>> +    switch (kvm_realm_state(kvm)) {
>> +    case REALM_STATE_NONE:
>> +        break;
>> +    case REALM_STATE_NEW:
>> +        return 0;
>> +    case REALM_STATE_DEAD:
>> +        return -ENXIO;
>> +    default:
>> +        return -EBUSY;
>> +    }
>> +
>> +    ret = realm_create_rd(kvm);
>> +    return ret;
>>   }
>>     static int set_ripas_of_protected_regions(struct kvm *kvm)
>
> Thanks,
> Gavin
>