Re: [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV

From: Amit Machhiwal

Date: Tue Jun 23 2026 - 09:31:54 EST


Hi Vaibhav,

Thanks for reviewing this patch. Please find my response inline.

On 2026/06/19 11:42 AM, Vaibhav Jain wrote:
> Hi Amit.
>
> Thanks for the patch and incorporating V3 review comments. Further
> review comments inline below:
>
> Amit Machhiwal <amachhiw@xxxxxxxxxxxxx> writes:
>
> > Currently, when booting a compatibility-mode KVM guest (L1) on a PowerNV
> > hypervisor (L0), the guest runs with the expected processor
> > compatibility level. However, when booting a nested KVM guest (L2)
> > inside the L1, QEMU derives the CPU model from the raw host PVR and
> > attempts to run the nested guest at that level, instead of honoring the
> > compatibility mode of the L1.
> >
> > Extend host CPU compatibility capability reporting to support nested
> > virtualization on PowerNV systems (PAPR nested API v1).
> >
> > For nested API v2 (PowerVM), compatibility capabilities are obtained
> > from the hypervisor via the H_GUEST_GET_CAPABILITIES hcall. This
> > information is not available on PowerNV systems.
> >
> > For nested API v1, derive the compatibility capabilities from the L1
> > guest by reading the "cpu-version" property from the device tree, which
> > reflects the effective (logical) processor compatibility level. Map this
> > value to the corresponding compatibility capability bitmap using
> > KVM-specific constants.
> >
> > Introduce a helper to translate CPU version values into KVM_PPC_COMPAT_CAP
> > bits and integrate it into kvmppc_get_compat_caps(). The implementation
> > applies masking to ensure only supported processor modes are exposed.
> >
> > This allows userspace to query host CPU compatibility modes on both
> > PowerVM and PowerNV platforms via the KVM_PPC_GET_COMPAT_CAPS ioctl.
> >
> > Suggested-by: Vaibhav Jain <vaibhav@xxxxxxxxxxxxx>
> > Signed-off-by: Amit Machhiwal <amachhiw@xxxxxxxxxxxxx>
> > ---
> > arch/powerpc/kvm/book3s_hv.c | 37 +++++++++++++++++++++++++++++++++++-
> > 1 file changed, 36 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index f674386df62c..375e7a7fa9f8 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -6523,15 +6523,50 @@ static bool kvmppc_hash_v3_possible(void)
> > return true;
> > }
> >
> > +static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
> > + unsigned long *capabilities)
> > +{
> > + switch (cpu_version) {
> > + case PVR_ARCH_31_P11:
> > + *capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
> Do you need to do 'break' here instead of falling through. Since P11
> host can support P10 and P9 compat modes

I had addressed a similar comment from Harsh in v1 of the series here:

https://lore.kernel.org/all/20260507202740.96fb259f-22-amachhiw@xxxxxxxxxxxxx/

The current implementation with break statements is intentional. This
function (kvmppc_map_compat_capabilities()) is called only when booting
a nested KVM guest (L2) on **KVM on PowerNV**.

When the L1 KVM guest is booted in a compat mode, L2 is supposed to boot
with the **same PVR version** as that of the L1, which is already taken
care of with the current changes. If L2 needs to boot with a different
*lower* compat mode, it would use max-cpu-compat, which takes a
different code path for setting the compat.

Even if I included all lower compat modes in the compat caps **APIv1**,
I don't think we'll be using those lower compat bits unless we wanted to
block a specific older compat for a given pvr level - which neither we
are doing in this series nor we may want to put such a restriction for
APIv1.

Please let me know if you think otherwise.

>
> > + break;
> > + case PVR_ARCH_31:
> > + *capabilities |= KVM_PPC_COMPAT_CAP_POWER10;
> > + break;
> > + case PVR_ARCH_300:
> > + *capabilities |= KVM_PPC_COMPAT_CAP_POWER9;
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> >
> > static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> > {
> > + struct device_node *np;
> > unsigned long capabilities = 0;
> > + const __be32 *prop = NULL;
> > long rc = -EINVAL;
> > + u32 cpu_version;
> >
> > if (kvmhv_on_pseries()) {
> > - if (kvmhv_is_nestedv2())
> > + if (kvmhv_is_nestedv2()) {
> > rc = plpar_guest_get_capabilities(0, &capabilities);
> > + } else {
> > + for_each_node_by_type(np, "cpu") {
> > + prop = of_get_property(np, "cpu-version", NULL);
> > + if (prop) {
> > + cpu_version = be32_to_cpup(prop);
> > + break;
> > + }
> > + }
> > + if (!prop)
> > + return -EINVAL;
> > + rc = kvmppc_map_compat_capabilities(cpu_version,
> > + &capabilities);
> > + }
> should you check for 'rc' error here before assigning 'capabilities' to
> 'host_caps->compat_capabilities' . I understand it will be set to '0'
> due to its initialization at the top of the function. But would be
> better to make it more explicit

Sure. The return value rc is checked by the caller but more error
checking is always good I guess. :)

I'll add a check for rc something like this (or something similar):

if (rc) {
return -EINVAL;
}

host_caps->compat_capabilities = capabilities &
KVM_PPC_COMPAT_BITMASK;

Thanks,
Amit

>
> > host_caps->compat_capabilities = capabilities &
> > KVM_PPC_COMPAT_BITMASK;
> > }
> > --
> > 2.50.1 (Apple Git-155)
> >
> >
>
> --
> Cheers
> ~ Vaibhav