Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

From: Marc Zyngier
Date: Fri Nov 19 2021 - 08:31:17 EST

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> wrote:
> > >
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > >
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> >
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> This logic is problematic (since "short window of time" is a heuristic
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton@xxxxxxxxxx>
> Date: Thu Sep 16 18:15:38 2021 +0000
> KVM: x86: Expose TSC offset controls to userspace
> To date, VMM-directed TSC synchronization and migration has been a bit
> messy. KVM has some baked-in heuristics around TSC writes to infer if
> the VMM is attempting to synchronize. This is problematic, as it depends
> on host userspace writing to the guest's TSC within 1 second of the last
> write.
> A much cleaner approach to configuring the guest's views of the TSC is to
> simply migrate the TSC offset for every vCPU. Offsets are idempotent,
> and thus not subject to change depending on when the VMM actually
> reads/writes values from/to KVM. The VMM can then read the TSC once with
> KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
> the guest is paused.
> So with that in place, the answer to
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> clockval = hostclockval - clockoffset
> Adding "clockoffset" to that will retrieve the host clock.
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?


Without deviation from the norm, progress is not possible.