Re: [PATCH v3 2/3] KVM: x86: On emulation failure, convey the exit reason, etc. to userspace

From: David Edmondson
Date: Mon Aug 02 2021 - 13:23:37 EST


On Monday, 2021-08-02 at 16:58:03 GMT, Sean Christopherson wrote:

> On Mon, Aug 02, 2021, David Edmondson wrote:
>> On Friday, 2021-07-30 at 22:14:48 GMT, Sean Christopherson wrote:
>>
>> > On Thu, Jul 29, 2021, David Edmondson wrote:
>> >> + __u64 exit_info1;
>> >> + __u64 exit_info2;
>> >> + __u32 intr_info;
>> >> + __u32 error_code;
>> >> + } exit_reason;
>> >
>> > Oooh, you're dumping all the fields in kvm_run. That took me forever to realize
>> > because the struct is named "exit_reason". Unless there's a naming conflict,
>> > 'data' would be the simplest, and if that's already taken, maybe 'info'?
>> >
>> > I'm also not sure an anonymous struct is going to be the easiest to maintain.
>> > I do like that the fields all have names, but on the other hand the data should
>> > be padded so that each field is in its own data[] entry when dumped to userspace.
>> > IMO, the padding complexity isn't worth the naming niceness since this code
>> > doesn't actually care about what each field contains.
>>
>> Given that this is avowedly not an ABI and that we are expecting any
>> (human) consumer to be intimate with the implementation to make sense of
>> it, is there really any requirement or need for padding?
>
> My thought with the padding was to force each field into its own data[] entry.
> E.g. if userspace does something like
>
> for (i = 0; i < ndata; i++)
> printf("\tdata[%d] = 0x%llx\n", i, data[i]);
>
> then padding will yield
>
> data[0] = flags
> data[1] = exit_reason
> data[2] = exit_info1
> data[3] = exit_info2
> data[4] = intr_info
> data[5] = error_code
>
> versus
>
> data[0] = <flags>
> data[1] = (exit_info1 << 32) | exit_reason
> data[2] = (exit_info2 << 32) | (exit_info1 >> 32)
> data[3] = (intr_info << 32) | (exit_info2 >> 32)
> data[4] = error_code
>
> Changing exit_reason to a u64 would clean up the worst of the mangling, but until
> there's actually a 64-bit exit reason to dump, that's just a more subtle way to
> pad the data.

Unnecessarily extending exit_reason to u64 would be bad, I agree.

>> In your example below (most of which I'm fine with), the padding has the
>> effect of wasting space that could be used for another u64 of debug
>> data.
>
> Yes, but because it's not ABI, we can change it in the future if we get to the
> point where we want to dump more info and don't have space. Until that time, I
> think it makes sense to prioritize readability with an ignorant (of the format)
> userspace over memory footprint.

This seems reasonable.

>> > /*
>> > * There's currently space for 13 entries, but 5 are used for the exit
>> > * reason and info. Restrict to 4 to reduce the maintenance burden
>> > * when expanding kvm_run.emulation_failure in the future.
>> > */
>> > if (WARN_ON_ONCE(ndata > 4))
>> > ndata = 4;
>> >
>> > if (insn_size) {
>> > ndata_start = 3;
>> > run->emulation_failure.flags =
>> > KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
>> > run->emulation_failure.insn_size = insn_size;
>> > memset(run->emulation_failure.insn_bytes, 0x90,
>> > sizeof(run->emulation_failure.insn_bytes));
>> > memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
>> > } else {
>> > /* Always include the flags as a 'data' entry. */
>> > ndata_start = 1;
>> > run->emulation_failure.flags = 0;
>> > }
>>
>> When we add another flag (presuming that we do, because if not there was
>> not much point in the flags) this will have to be restructured again. Is
>> there an objection to the original style? (prime ndata=1, flags=0, OR in
>> flags and adjust ndata as we go.)
>
> No objection, though if you OR in flags then you should truly _adjust_ ndata, not
> set it, e.g.

My understanding of Aaron's intent is that this would not be the
case.

That is, if we add another flag with payload and set that flag, we would
still have space for the instruction stream in data[] even if
KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is not set.

Given that, we must *set* ndata each time we add in a flag, with the
value being the extent of data[] used by the payload corresponding to
that flag, and the flags must be considered in ascending order (or we
remember a "max" along the way).

Dumping the arbitray debug data after the defined fields would require
adjusting ndata, of course.

If this is not the case, and the flag indicated payloads are packed at
the head of data[], then the current structure definition is misleading
and we should perhaps revise it.

> /* Always include the flags as a 'data' entry. */
> ndata_start = 1;
> run->emulation_failure.flags = 0;
>
> if (insn_size) {
> ndata_start += 2; <----------------------- Adjust, not override
> run->emulation_failure.flags |=
> KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES;
> run->emulation_failure.insn_size = insn_size;
> memset(run->emulation_failure.insn_bytes, 0x90,
> sizeof(run->emulation_failure.insn_bytes));
> memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size);
> }
>
>> > memcpy(&run->internal.data[ndata_start], info, ARRAY_SIZE(info));
>> > memcpy(&run->internal.data[ndata_start + ARRAY_SIZE(info)], data, ndata);
>> > }