Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE
From: Anup Patel
Date: Fri Apr 03 2026 - 02:19:46 EST
On Fri, Apr 3, 2026 at 7:32 AM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>
> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
> >>>
> >>> From: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
> >>>
> >>> Add a VM capability that allows userspace to select the G-stage page table
> >>> format by setting HGATP.MODE on a per-VM basis.
> >>>
> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> >>> not supported by the host, and with -EBUSY if the VM has already been
> >>> committed (e.g. vCPUs have been created or any memslot is populated).
> >>>
> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> >>> HGATP.MODE formats supported by the host.
> >>>
> >>> Signed-off-by: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
> >>> Reviewed-by: Andrew Jones <andrew.jones@xxxxxxxxxxxxxxxx>
> >>> Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
> >>> ---
> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> >>> include/uapi/linux/kvm.h | 1 +
> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >>> index 032516783e96..9d7f6958fa81 100644
> >>> --- a/Documentation/virt/kvm/api.rst
> >>> +++ b/Documentation/virt/kvm/api.rst
> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> >>> This capability can be enabled dynamically even if VCPUs were already
> >>> created and are running.
> >>>
> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> >>> +---------------------------------
> >>> +
> >>> +:Architectures: riscv
> >>> +:Type: VM
> >>> +:Parameters: args[0] contains the requested HGATP mode
> >>> +:Returns:
> >>> + - 0 on success.
> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> >>> + hardware.
> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> >>> + non-empty memslots.
> >>> +
> >>> +This capability allows userspace to explicitly select the HGATP mode for
> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
> >>> +capability must be enabled before creating any vCPUs or memslots.
> >>> +
> >>> +If this capability is not enabled, KVM will select the default HGATP mode
> >>> +automatically. The default is the highest HGATP.MODE value supported by
> >>> +hardware.
> >>> +
> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> >>> +
> >>> 8. Other capabilities.
> >>> ======================
> >>>
> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> >>> index 4d82a886102c..5e82a3ad3ad0 100644
> >>> --- a/arch/riscv/kvm/vm.c
> >>> +++ b/arch/riscv/kvm/vm.c
> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>> case KVM_CAP_VM_GPA_BITS:
> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> >>> break;
> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >>> + r = kvm_riscv_get_hgatp_mode_mask();
> >>> + break;
> >>
> >>Introducing a new RISC-V capability looks a bit complex.
> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
> >>simply re-use KVM_CAP_VM_GPA_BITS.
> >>
> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
> >>return number of GPA bits which in-directly implies the underlying
> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
> >>are also supported as-per RISC-V privileged specification.
> >>
> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
> >>will take the desired number of GPA bits and downsize the selected
> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
> >>and GPA bits > 50 then we select Sv57x4.
> >>
> >
> >Thanks, that makes sense.
> >
> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
> >for both discovery and selection.
> >
>
> Hi Anup,
>
> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
> intended ABI before posting v8.
>
> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
> Userspace might then assume 50 is the maximum supported by that VM/host and lose
> the information that the host actually supports 59 (Sv57x4).
I think there is no violation of the semantics because we are providing
a way to allow KVM user space change "the GPA bits for this VM”
using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
effective number of GPA bits visible to the VM.
The only additional constraint I would enforce is that the
KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
return -EBUSY if any of the Guest VCPUs have
ran_atleast_once set.
Regards,
Anup
>
> Thanks,
> Fangyu
>
> >Thanks,
> >Fangyu
> >
> >>> default:
> >>> r = 0;
> >>> break;
> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >>>
> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> >>> {
> >>> + if (cap->flags)
> >>> + return -EINVAL;
> >>> +
> >>> switch (cap->cap) {
> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
> >>> - if (cap->flags)
> >>> - return -EINVAL;
> >>> kvm->arch.mp_state_reset = true;
> >>> return 0;
> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> >>> + return -EINVAL;
> >>> +
> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> >>> + return -EBUSY;
> >>> +#ifdef CONFIG_64BIT
> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> >>> +#endif
> >>> + return 0;
> >>> default:
> >>> return -EINVAL;
> >>> }
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index 80364d4dbebb..a74a80fd4046 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
> >>> #define KVM_CAP_S390_KEYOP 247
> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
> >>>
> >>> struct kvm_irq_routing_irqchip {
> >>> __u32 irqchip;
> >>> --
> >>> 2.50.1
> >>>
> >>
> >>Regards,
> >>Anup