Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
From: fangyu . yu
Date: Thu Apr 02 2026 - 22:04:58 EST
>>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>> From: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
>>>
>>> Add a VM capability that allows userspace to select the G-stage page table
>>> format by setting HGATP.MODE on a per-VM basis.
>>>
>>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>>> not supported by the host, and with -EBUSY if the VM has already been
>>> committed (e.g. vCPUs have been created or any memslot is populated).
>>>
>>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>>> HGATP.MODE formats supported by the host.
>>>
>>> Signed-off-by: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
>>> Reviewed-by: Andrew Jones <andrew.jones@xxxxxxxxxxxxxxxx>
>>> Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
>>> ---
>>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
>>> include/uapi/linux/kvm.h | 1 +
>>> 3 files changed, 44 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 032516783e96..9d7f6958fa81 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>>> This capability can be enabled dynamically even if VCPUs were already
>>> created and are running.
>>>
>>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>>> +---------------------------------
>>> +
>>> +:Architectures: riscv
>>> +:Type: VM
>>> +:Parameters: args[0] contains the requested HGATP mode
>>> +:Returns:
>>> + - 0 on success.
>>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>>> + hardware.
>>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>>> + non-empty memslots.
>>> +
>>> +This capability allows userspace to explicitly select the HGATP mode for
>>> +the VM. The selected mode must be supported by both KVM and hardware. This
>>> +capability must be enabled before creating any vCPUs or memslots.
>>> +
>>> +If this capability is not enabled, KVM will select the default HGATP mode
>>> +automatically. The default is the highest HGATP.MODE value supported by
>>> +hardware.
>>> +
>>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>>> +
>>> 8. Other capabilities.
>>> ======================
>>>
>>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>>> index 4d82a886102c..5e82a3ad3ad0 100644
>>> --- a/arch/riscv/kvm/vm.c
>>> +++ b/arch/riscv/kvm/vm.c
>>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>> case KVM_CAP_VM_GPA_BITS:
>>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>>> break;
>>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>>> + r = kvm_riscv_get_hgatp_mode_mask();
>>> + break;
>>
>>Introducing a new RISC-V capability looks a bit complex.
>>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>>simply re-use KVM_CAP_VM_GPA_BITS.
>>
>>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>>return number of GPA bits which in-directly implies the underlying
>>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>>are also supported as-per RISC-V privileged specification.
>>
>>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>>will take the desired number of GPA bits and downsize the selected
>>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>>and GPA bits > 50 then we select Sv57x4.
>>
>
>Thanks, that makes sense.
>
>In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
>for both discovery and selection.
>
Hi Anup,
While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
intended ABI before posting v8.
One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
Userspace might then assume 50 is the maximum supported by that VM/host and lose
the information that the host actually supports 59 (Sv57x4).
Thanks,
Fangyu
>Thanks,
>Fangyu
>
>>> default:
>>> r = 0;
>>> break;
>>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>
>>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>> {
>>> + if (cap->flags)
>>> + return -EINVAL;
>>> +
>>> switch (cap->cap) {
>>> case KVM_CAP_RISCV_MP_STATE_RESET:
>>> - if (cap->flags)
>>> - return -EINVAL;
>>> kvm->arch.mp_state_reset = true;
>>> return 0;
>>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>>> + return -EINVAL;
>>> +
>>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>>> + return -EBUSY;
>>> +#ifdef CONFIG_64BIT
>>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>>> +#endif
>>> + return 0;
>>> default:
>>> return -EINVAL;
>>> }
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 80364d4dbebb..a74a80fd4046 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>>> #define KVM_CAP_ARM_SEA_TO_USER 245
>>> #define KVM_CAP_S390_USER_OPEREXEC 246
>>> #define KVM_CAP_S390_KEYOP 247
>>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>>>
>>> struct kvm_irq_routing_irqchip {
>>> __u32 irqchip;
>>> --
>>> 2.50.1
>>>
>>
>>Regards,
>>Anup