Re: [PATCH] KVM: nVMX: properly pad struct kvm_vmx_nested_state_hdr

From: Paolo Bonzini
Date: Mon Jul 27 2020 - 12:17:09 EST

On 27/07/20 17:46, Sean Christopherson wrote:
>> It might as well send it now if the code didn't attempt to zero the
>> struct before filling it in (this is another good reason to use a
>> "flags" field to say what's been filled in).
> The issue is that flags itself could hold garbage.

Quoting from there:

> Which means that userspace built for the old kernel will potentially send in
> garbage for the new 'flags' field due to it being uninitialized stack data,
> even with the layout after this patch.

Userspace should always zero everything. I don't think that the padding
between fields is any different from the other bytes padding the header
to 128 bytes.

> struct kvm_vmx_nested_state_hdr hdr = {
> .vmxon_pa = root,
> .vmcs12_pa = vmcs12,
> };
> QEMU won't see issues because it zero allocates the entire nested state.
> All the above being said, after looking at the whole picture I think padding
> the header is a moot point. The header is padded out to 120 bytes[*] when
> including in the full nested state, and KVM only ever consumes the header in
> the context of the full nested state. I.e. if there's garbage at offset 6,
> odds are there's going to be garbage at offset 18, so internally padding the
> header does nothing.

Yes, that was what I was hinting at with "it might as well send it now"
(i.e., after the patch).

(All of this is moot for userspace that just uses KVM_GET_NESTED_STATE
and passes it back to KVM_SET_NESTED_STATE).

> KVM should be checking that the unused bytes of (sizeof(pad) - sizeof(vmx/svm))
> is zero if we want to expand into the padding in the future. Right now we're
> relying on userspace to zero allocate the struct without enforcing it.

The alternative, which is almost as good, is to only use these extra
fields which could be garbage if the flags are not set, and check the
flags (see the patches I have sent earlier today).

The chance of the flags passing the check will decrease over time as
more flags are added; but the chance of having buggy userspace that
sends down garbage also will.

> [*] Amusing side note, the comment in the header is wrong. It states "pad
> the header to 128 bytes", but only pads it to 120 bytes, because union.
> struct kvm_nested_state {
> __u16 flags;
> __u16 format;
> __u32 size;
> union {
> struct kvm_vmx_nested_state_hdr vmx;
> struct kvm_svm_nested_state_hdr svm;
> /* Pad the header to 128 bytes. */
> __u8 pad[120];
> } hdr;

There are 8 bytes before the union, and it's not a coincidence. :)
"Header" refers to the stuff before the data region.

> Odds are no real VMM will have issue given the dynamic size of struct
> kvm_nested_state, but

... *suspence* ...