Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace

From: Sean Christopherson
Date: Mon Aug 02 2021 - 12:58:12 EST


On Mon, Aug 02, 2021, David Edmondson wrote:
> On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:
>
> > On Thu, Jul 29, 2021, David Edmondson wrote:
> >> + __u64 exit_info1;
> >> + __u64 exit_info2;
> >> + __u32 intr_info;
> >> + __u32 error_code;
> >> + } exit_reason;
> >
> > Oooh, you're dumping all the fields in kvm_run. That took me forever to realize
> > because the struct is named "exit_reason". Unless there's a naming conflict,
> > 'data' would be the simplest, and if that's already taken, maybe 'info'?
> >
> > I'm also not sure an anonymous struct is going to be the easiest to maintain.
> > I do like that the fields all have names, but on the other hand the data should
> > be padded so that each field is in its own data[] entry when dumped to userspace.
> > IMO, the padding complexity isn't worth the naming niceness since this code
> > doesn't actually care about what each field contains.
>
> Given that this is avowedly not an ABI and that we are expecting any
> (human) consumer to be intimate with the implementation to make sense of
> it, is there really any requirement or need for padding?

My thought with the padding was to force each field into its own data[] entry.
E.g. if userspace does something like

for (i = 0; i < ndata; i++)
printf("\tdata[%d] = 0x%llx\n", i, data[i]);

then padding will yield

data[0] = flags
data[1] = exit_reason
data[2] = exit_info1
data[3] = exit_info2
data[4] = intr_info
data[5] = error_code

versus

data[0] = <flags>
data[1] = (exit_info1 << 32) | exit_reason
data[2] = (exit_info2 << 32) | (exit_info1 >> 32)
data[3] = (intr_info << 32) | (exit_info2 >> 32)
data[4] = error_code

Changing exit_reason to a u64 would clean up the worst of the mangling, but until
there's actually a 64-bit exit reason to dump, that's just a more subtle way to
pad the data.

> In your example below (most of which I'm fine with), the padding has the
> effect of wasting space that could be used for another u64 of debug
> data.

Yes, but because it's not ABI, we can change it in the future if we get to the
point where we want to dump more info and don't have space. Until that time, I
think it makes sense to prioritize readability with an ignorant (of the format)
userspace over memory footprint.

> > /*
> > * There's currently space for 13 entries, but 5 are used for the exit
> > * reason and info. Restrict to 4 to reduce the maintenance burden
> > * when expanding kvm_run.emulation_failure in the future.
> > */
> > if (WARN_ON_ONCE(ndata > 4))
> > ndata = 4;
> >
> > if (insn_size) {
> > ndata_start = 3;
> > run->emulation_failure.flags =
> > KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> > run->emulation_failure.insn_size = insn_size;
> > memset(run->emulation_failure.insn_bytes, 0x90,
> > sizeof(run->emulation_failure.insn_bytes));
> > memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
> > } else {
> > /* Always include the flags as a 'data' entry. */
> > ndata_start = 1;
> > run->emulation_failure.flags = 0;
> > }
>
> When we add another flag (presuming that we do, because if not there was
> not much point in the flags) this will have to be restructured again. Is
> there an objection to the original style? (prime ndata=1, flags=0, OR in
> flags and adjust ndata as we go.)

No objection, though if you OR in flags then you should truly _adjust_ ndata, not
set it, e.g.

/* Always include the flags as a 'data' entry. */
ndata_start = 1;
run->emulation_failure.flags = 0;

if (insn_size) {
ndata_start += 2; <----------------------- Adjust, not override
run->emulation_failure.flags |=
KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
run->emulation_failure.insn_size = insn_size;
memset(run->emulation_failure.insn_bytes, 0x90,
sizeof(run->emulation_failure.insn_bytes));
memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
}

> > memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
> > memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
> > }