Re: Re: Re: Re: [PATCH v7 4/4] RISC-V: KVM: add KVM_CAP_RISCV_SET_HGATP_MODE

From: Anup Patel

Date: Fri Apr 03 2026 - 04:11:30 EST


On Fri, Apr 3, 2026 at 12:37 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
>
> >>
> >> >>On Thu, Apr 2, 2026 at 6:53 PM <fangyu.yu@xxxxxxxxxxxxxxxxx> wrote:
> >> >>>
> >> >>> From: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
> >> >>>
> >> >>> Add a VM capability that allows userspace to select the G-stage page table
> >> >>> format by setting HGATP.MODE on a per-VM basis.
> >> >>>
> >> >>> Userspace enables the capability via KVM_ENABLE_CAP, passing the requested
> >> >>> HGATP.MODE in args[0]. The request is rejected with -EINVAL if the mode is
> >> >>> not supported by the host, and with -EBUSY if the VM has already been
> >> >>> committed (e.g. vCPUs have been created or any memslot is populated).
> >> >>>
> >> >>> KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE) returns a bitmask of the
> >> >>> HGATP.MODE formats supported by the host.
> >> >>>
> >> >>> Signed-off-by: Fangyu Yu <fangyu.yu@xxxxxxxxxxxxxxxxx>
> >> >>> Reviewed-by: Andrew Jones <andrew.jones@xxxxxxxxxxxxxxxx>
> >> >>> Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
> >> >>> ---
> >> >>> Documentation/virt/kvm/api.rst | 27 +++++++++++++++++++++++++++
> >> >>> arch/riscv/kvm/vm.c | 18 ++++++++++++++++--
> >> >>> include/uapi/linux/kvm.h | 1 +
> >> >>> 3 files changed, 44 insertions(+), 2 deletions(-)
> >> >>>
> >> >>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> >>> index 032516783e96..9d7f6958fa81 100644
> >> >>> --- a/Documentation/virt/kvm/api.rst
> >> >>> +++ b/Documentation/virt/kvm/api.rst
> >> >>> @@ -8902,6 +8902,33 @@ helpful if user space wants to emulate instructions which are not
> >> >>> This capability can be enabled dynamically even if VCPUs were already
> >> >>> created and are running.
> >> >>>
> >> >>> +7.47 KVM_CAP_RISCV_SET_HGATP_MODE
> >> >>> +---------------------------------
> >> >>> +
> >> >>> +:Architectures: riscv
> >> >>> +:Type: VM
> >> >>> +:Parameters: args[0] contains the requested HGATP mode
> >> >>> +:Returns:
> >> >>> + - 0 on success.
> >> >>> + - -EINVAL if args[0] is outside the range of HGATP modes supported by the
> >> >>> + hardware.
> >> >>> + - -EBUSY if vCPUs have already been created for the VM, if the VM has any
> >> >>> + non-empty memslots.
> >> >>> +
> >> >>> +This capability allows userspace to explicitly select the HGATP mode for
> >> >>> +the VM. The selected mode must be supported by both KVM and hardware. This
> >> >>> +capability must be enabled before creating any vCPUs or memslots.
> >> >>> +
> >> >>> +If this capability is not enabled, KVM will select the default HGATP mode
> >> >>> +automatically. The default is the highest HGATP.MODE value supported by
> >> >>> +hardware.
> >> >>> +
> >> >>> +``KVM_CHECK_EXTENSION(KVM_CAP_RISCV_SET_HGATP_MODE)`` returns a bitmask of
> >> >>> +HGATP.MODE values supported by the host. A return value of 0 indicates that
> >> >>> +the capability is not supported. Supported-mode bitmask use HGATP.MODE
> >> >>> +encodings as defined by the RISC-V privileged specification, such as Sv39x4
> >> >>> +corresponds to HGATP.MODE=8, so userspace should test bitmask & BIT(8).
> >> >>> +
> >> >>> 8. Other capabilities.
> >> >>> ======================
> >> >>>
> >> >>> diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c
> >> >>> index 4d82a886102c..5e82a3ad3ad0 100644
> >> >>> --- a/arch/riscv/kvm/vm.c
> >> >>> +++ b/arch/riscv/kvm/vm.c
> >> >>> @@ -201,6 +201,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> >>> case KVM_CAP_VM_GPA_BITS:
> >> >>> r = kvm_riscv_gstage_gpa_bits(kvm->arch.pgd_levels);
> >> >>> break;
> >> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >> >>> + r = kvm_riscv_get_hgatp_mode_mask();
> >> >>> + break;
> >> >>
> >> >>Introducing a new RISC-V capability looks a bit complex.
> >> >>Instead of KVM_CAP_RISCV_SET_HGATP_MODE, we can
> >> >>simply re-use KVM_CAP_VM_GPA_BITS.
> >> >>
> >> >>The kvm_vm_ioctl_check_extension() for KVM_CAP_VM_GPA_BITS
> >> >>return number of GPA bits which in-directly implies the underlying
> >> >>hgatp.MODE. As we know, if it return 59 bits GPA then it means
> >> >>Sv57x4 is the selected hgatp.MODE and Sv48x4 and Sv39x4 modes
> >> >>are also supported as-per RISC-V privileged specification.
> >> >>
> >> >>The kvm_vm_ioctl_enable_cap() for KVM_CAP_VM_GPA_BITS
> >> >>will take the desired number of GPA bits and downsize the selected
> >> >>hgatp.MODE. For example, if user-space ask GPA bits <= 50 and
> >> >>GPA bits > 41 then we select Sv48x4. If user-space ask GPA
> >> >>bits <= 41 then we select Sv39x4. If user-space ask GPA bits <= 59
> >> >>and GPA bits > 50 then we select Sv57x4.
> >> >>
> >> >
> >> >Thanks, that makes sense.
> >> >
> >> >In v8 I’ll drop KVM_CAP_RISCV_SET_HGATP_MODE and re-use KVM_CAP_VM_GPA_BITS
> >> >for both discovery and selection.
> >> >
> >>
> >> Hi Anup,
> >>
> >> While working on the respin reusing KVM_CAP_VM_GPA_BITS, I realized
> >> a potential ambiguity in CHECK_EXTENSION semantics and wanted to confirm the
> >> intended ABI before posting v8.
> >>
> >> One concern about the semantics: today KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS)
> >> on a VM fd may be interpreted as “the GPA bits for this VM” (or at least what
> >> this VM can use). If we also use KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) to downsize
> >> the selected HGATP.MODE for a particular VM (e.g. to Sv48x4 => 50 bits), then a
> >> subsequent CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) on the same VM fd would return 50.
> >> Userspace might then assume 50 is the maximum supported by that VM/host and lose
> >> the information that the host actually supports 59 (Sv57x4).
> >
> >I think there is no violation of the semantics because we are providing
> >a way to allow KVM user space change "the GPA bits for this VM”
> >using KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) so subsequent
> >CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) must return
> >effective number of GPA bits visible to the VM.
>
> Thanks, agreed.
>
> >The only additional constraint I would enforce is that the
> >KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) must
> >return -EBUSY if any of the Guest VCPUs have
> >ran_atleast_once set.
> >
>
> In my current implementation I already return -EBUSY if kvm->created_vcpus
> is non-zero, i.e. the GPA bits can only be changed before any vCPU is created.

Checking kvm->created_vcpus is perfectly fine so no need to change this.

Regards,
Anup

>
> Thanks,
> Fangyu
>
> >Regards,
> >Anup
> >
> >>
> >> Thanks,
> >> Fangyu
> >>
> >> >Thanks,
> >> >Fangyu
> >> >
> >> >>> default:
> >> >>> r = 0;
> >> >>> break;
> >> >>> @@ -211,12 +214,23 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >> >>>
> >> >>> int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
> >> >>> {
> >> >>> + if (cap->flags)
> >> >>> + return -EINVAL;
> >> >>> +
> >> >>> switch (cap->cap) {
> >> >>> case KVM_CAP_RISCV_MP_STATE_RESET:
> >> >>> - if (cap->flags)
> >> >>> - return -EINVAL;
> >> >>> kvm->arch.mp_state_reset = true;
> >> >>> return 0;
> >> >>> + case KVM_CAP_RISCV_SET_HGATP_MODE:
> >> >>> + if (!kvm_riscv_hgatp_mode_is_valid(cap->args[0]))
> >> >>> + return -EINVAL;
> >> >>> +
> >> >>> + if (kvm->created_vcpus || !kvm_are_all_memslots_empty(kvm))
> >> >>> + return -EBUSY;
> >> >>> +#ifdef CONFIG_64BIT
> >> >>> + kvm->arch.pgd_levels = 3 + cap->args[0] - HGATP_MODE_SV39X4;
> >> >>> +#endif
> >> >>> + return 0;
> >> >>> default:
> >> >>> return -EINVAL;
> >> >>> }
> >> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> >>> index 80364d4dbebb..a74a80fd4046 100644
> >> >>> --- a/include/uapi/linux/kvm.h
> >> >>> +++ b/include/uapi/linux/kvm.h
> >> >>> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> >> >>> #define KVM_CAP_ARM_SEA_TO_USER 245
> >> >>> #define KVM_CAP_S390_USER_OPEREXEC 246
> >> >>> #define KVM_CAP_S390_KEYOP 247
> >> >>> +#define KVM_CAP_RISCV_SET_HGATP_MODE 248
> >> >>>
> >> >>> struct kvm_irq_routing_irqchip {
> >> >>> __u32 irqchip;
> >> >>> --
> >> >>> 2.50.1
> >> >>>
> >> >>
> >> >>Regards,
> >> >>Anup
> >