Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

From: Binbin Wu
Date: Thu May 30 2024 - 21:23:16 EST




On 5/30/2024 4:02 AM, Sean Christopherson wrote:
On Tue, May 28, 2024, Paolo Bonzini wrote:
On Mon, May 27, 2024 at 2:26 PM Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> wrote:
It seems like TDX should be able to do something similar by limiting the
size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
the entire request and can return to guest, so it actually seems a little
more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
similar names :))

So doesn't seem like there's a good reason to expose any of these
throttling details to userspace,
I think userspace should never be worried about throttling. I would
say it's up to the guest to split the GPA into multiple ranges,
I agree in principle, but in practice I can understand not wanting to split up
the conversion in the guest due to the additional overhead of the world switches.

but that's not how arch/x86/coco/tdx/tdx.c is implemented so instead we can
do the split in KVM instead. It can be a module parameter or VM attribute,
establishing the size that will be processed in a single TDVMCALL.
Is it just interrupts that are problematic for conversions? I assume so, because
I can't think of anything else where telling the guest to retry would be appropriate
and useful.

The concern was the lockup detection in guest.


If so, KVM shouldn't need to unconditionally restrict the size for a single
TDVMCALL, KVM just needs to ensure interrupts are handled soonish. To do that,
KVM could use a much smaller chunk size, e.g. 64KiB (completely made up number),
and keep processing the TDVMCALL as long as there is no interrupt pending.
Hopefully that would obviate the need for a tunable.

Thanks for the suggestion.
By this way, interrupt can be injected to guest in time and the lockup detection should not be a problem.

About the chunk size, if it is too small, it will increase the cost of kernel/userspace context switches.
Maybe 2MB?