Re: [PATCH v3 2/3] KVM: x86: Include host suspended time in steal time.
From: Sean Christopherson
Date: Wed Jan 08 2025 - 10:18:09 EST
On Wed, Jan 08, 2025, Suleiman Souhlal wrote:
> On Wed, Jan 8, 2025 at 12:37 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > On Tue, Jan 07, 2025, Suleiman Souhlal wrote:
> > > Note that the case of a suspend happening during a VM migration
> > > might not be accounted.
> >
> > And this isn't considered a bug because? I asked for documentation, not a
> > statement of fact.
>
> I guess I don't really understand what the difference between documentation
> and statements of fact is.
It's the difference between "X may be buggy" and "X has this exact caveat because
KVM doesn't support saving/restoring associated metadata". For this case, I'm
pretty sure it's possible to document exactly when suspended time will be lost,
what may happen if that accounted time is lost, what userspace could do to avoid
dropping suspend time, and finally for the changelog only, specifically why KVM
support for accounting suspended time is *intentionally* not providing APIs to
save/restore pending suspended time.
> It's not completely clear to me what the desired behavior would be when
> suspending during a VM migration.
That's fine, it's not your (or KVM's) responsibility to know the desired
behavior for every possible/theoretical use case. But as above, it *is* KVM's
responsibility to properly document any caveats with functionality.
> If we wanted to inject the suspend duration that happened after the migration
> started, but before it ended, I suppose we would need to add a way for the
> new VM instance to add to steal time, possibly through a new uAPI.
It's not just migration, and it's not just the case where the host is suspended
*after* vCPU state is saved. Pending suspended time will also be lost if vCPU
state is saved after suspend+resume without entering the guest (because the
accounting is done late in KVM_RUN).
> It is also not clear to me why we would want that.
Which is fine as well. If some crazy use case cares about accounting suspended
time across save+restore, then the onus is on that use case to implement and
justify appropriate support.