Re: Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE

From: fangyu . yu

Date: Fri Apr 03 2026 - 03:07:48 EST


>>
>> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>> >>>
>> >>> From: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
>> >>>
>> >>> Add a VM capability that allows userspace to select the G-stage page table
>> >>> format by setting HGATP.MODE on a per-VM basis.
>> >>>
>> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
>> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
>> >>> not supported by the host, and with -EBUSY if the VM has already been
>> >>> committed (e.g. vCPUs have been created or any memslot is populated).
>> >>>
>> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
>> >>> HGATP.MODE formats supported by the host.
>> >>>
>> >>> Signed-off-by: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
>> >>> Reviewed-by: Andrew Jones <andrew.jones@xxxxxxxxxxxxxxxx>
>> >>> Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
>> >>> ---
>> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
>> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
>> >>> include/uapi/linux/kvm.h | 1 +
>> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> >>> index 032516783e96..9d7f6958fa81 100644
>> >>> --- a/Documentation/virt/kvm/api.rst
>> >>> +++ b/Documentation/virt/kvm/api.rst
>> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
>> >>> This capability can be enabled dynamically even if VCPUs were already
>> >>> created and are running.
>> >>>
>> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
>> >>> +---------------------------------
>> >>> +
>> >>> +:Architectures: riscv
>> >>> +:Type: VM
>> >>> +:Parameters: args[0] contains the requested HGATP mode
>> >>> +:Returns:
>> >>> + - 0 on success.
>> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
>> >>> + hardware.
>> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
>> >>> + non-empty memslots.
>> >>> +
>> >>> +This capability allows userspace to explicitly select the HGATP mode for
>> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
>> >>> +capability must be enabled before creating any vCPUs or memslots.
>> >>> +
>> >>> +If this capability is not enabled, KVM will select the default HGATP mode
>> >>> +automatically. The default is the highest HGATP.MODE value supported by
>> >>> +hardware.
>> >>> +
>> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
>> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
>> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
>> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
>> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
>> >>> +
>> >>> 8. Other capabilities.
>> >>> ======================
>> >>>
>> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
>> >>> index 4d82a886102c..5e82a3ad3ad0 100644
>> >>> --- a/arch/riscv/kvm/vm.c
>> >>> +++ b/arch/riscv/kvm/vm.c
>> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>> case KVM_CAP_VM_GPA_BITS:
>> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
>> >>> break;
>> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> + r = kvm_riscv_get_hgatp_mode_mask();
>> >>> + break;
>> >>
>> >>Introducing a new RISC-V capability looks a bit complex.
>> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
>> >>simply re-use KVM_CAP_VM_GPA_BITS.
>> >>
>> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
>> >>return number of GPA bits which in-directly implies the underlying
>> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
>> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
>> >>are also supported as-per RISC-V privileged specification.
>> >>
>> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
>> >>will take the desired number of GPA bits and downsize the selected
>> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
>> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
>> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
>> >>and GPA bits > 50 then we select Sv57x4.
>> >>
>> >
>> >Thanks, that makes sense.
>> >
>> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
>> >for both discovery and selection.
>> >
>>
>> Hi Anup,
>>
>> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
>> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
>> intended ABI before posting v8.
>>
>> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
>> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
>> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
>> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
>> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
>> Userspace might then assume 50 is the maximum supported by that VM/host and lose
>> the information that the host actually supports 59 (Sv57x4).
>
>I think there is no violation of the semantics because we are providing
>a way to allow KVM user space change "the GPA bits for this VM”
>using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
>CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
>effective number of GPA bits visible to the VM.

Thanks, agreed.

>The only additional constraint I would enforce is that the
>KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
>return -EBUSY if any of the Guest VCPUs have
>ran_atleast_once set.
>

In my current implementation I already return -EBUSY if kvm->created_vcpus
is non-zero, i.e. the GPA bits can only be changed before any vCPU is created.

Thanks,
Fangyu

>Regards,
>Anup
>
>>
>> Thanks,
>> Fangyu
>>
>> >Thanks,
>> >Fangyu
>> >
>> >>> default:
>> >>> r = 0;
>> >>> break;
>> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>> >>>
>> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>> >>> {
>> >>> + if (cap->flags)
>> >>> + return -EINVAL;
>> >>> +
>> >>> switch (cap->cap) {
>> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
>> >>> - if (cap->flags)
>> >>> - return -EINVAL;
>> >>> kvm->arch.mp_state_reset = true;
>> >>> return 0;
>> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
>> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
>> >>> + return -EINVAL;
>> >>> +
>> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
>> >>> + return -EBUSY;
>> >>> +#ifdef CONFIG_64BIT
>> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
>> >>> +#endif
>> >>> + return 0;
>> >>> default:
>> >>> return -EINVAL;
>> >>> }
>> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> >>> index 80364d4dbebb..a74a80fd4046 100644
>> >>> --- a/include/uapi/linux/kvm.h
>> >>> +++ b/include/uapi/linux/kvm.h
>> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
>> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
>> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
>> >>> #define KVM_CAP_S390_KEYOP 247
>> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
>> >>>
>> >>> struct kvm_irq_routing_irqchip {
>> >>> __u32 irqchip;
>> >>> --
>> >>> 2.50.1
>> >>>
>> >>
>> >>Regards,
>> >>Anup
>