Re: [PATCH for-5.18] KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT

From: Oliver Upton
Date: Fri Apr 29 2022 - 00:39:01 EST


Hi Paolo,

On Fri, Apr 22, 2022 at 12:30:13PM +0200, Paolo Bonzini wrote:
> When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
> member that at the time was unused. Unfortunately this extensibility
> mechanism has several issues:
>
> - x86 is not writing the member, so it is not possible to use it
> on x86 except for new events
>
> - the member is not aligned to 64 bits, so the definition of the
> uAPI struct is incorrect for 32-bit userspace. This is a problem
> for RISC-V, which supports CONFIG_KVM_COMPAT.
>
> Since padding has to be introduced, place a new field in there
> that tells if the flags field is valid. To allow further extensibility,
> in fact, change flags to an array of 16 values, and store how many
> of the values are valid. The availability of the new ndata field
> is tied to a system capability; all architectures are changed to
> fill in the field.
>
> For compatibility with userspace that was using the flags field,
> a union overlaps flags with data[0].
>
> Supersedes: <20220421180443.1465634-1-pbonzini@xxxxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Peter Gonda <pgonda@xxxxxxxxxx>
> Cc: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> Documentation/virt/kvm/api.rst | 24 +++++++++++++++++-------
> arch/arm64/kvm/psci.c | 3 ++-
> arch/riscv/kvm/vcpu_sbi.c | 3 ++-
> arch/x86/kvm/x86.c | 2 ++
> include/uapi/linux/kvm.h | 8 +++++++-
> virt/kvm/kvm_main.c | 1 +
> 6 files changed, 31 insertions(+), 10 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 85c7abc51af5..4a900cdbc62e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5986,16 +5986,16 @@ should put the acknowledged interrupt vector into the 'epr' field.
> #define KVM_SYSTEM_EVENT_RESET 2
> #define KVM_SYSTEM_EVENT_CRASH 3
> __u32 type;
> - __u64 flags;
> + __u32 ndata;
> + __u64 data[16];

This is out of sync with the union { flags; data; } now.

IMO, we should put a giant disclaimer on all of this to *not* use the
flags field and instead only use data. I imagine we wont want to persist
the union forever as it is quite ugly, but necessary.

[...]

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 91a6fe4e02c0..f903ab0c8d7a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -445,7 +445,11 @@ struct kvm_run {
> #define KVM_SYSTEM_EVENT_RESET 2
> #define KVM_SYSTEM_EVENT_CRASH 3
> __u32 type;
> - __u64 flags;
> + __u32 ndata;
> + union {
> + __u64 flags;
> + __u64 data[16];
> + };
> } system_event;
> /* KVM_EXIT_S390_STSI */
> struct {
> @@ -1144,6 +1148,8 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_S390_MEM_OP_EXTENSION 211
> #define KVM_CAP_PMU_CAPABILITY 212
> #define KVM_CAP_DISABLE_QUIRKS2 213
> +/* #define KVM_CAP_VM_TSC_CONTROL 214 */

This sticks out a bit. Couldn't the VM TSC control patch just use a
different number? It seems that there will be a conflict anyway, if only to
delete this comment.

How do we go about getting CAP numbers for features coming in from other
architectures? An eager backport (such as the Android case that made us
look at a union) would wind up using the wrong capability for a feature.

--
Thanks,
Oliver