Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

From: Marc Zyngier
Date: Tue Nov 23 2021 - 06:09:24 EST

On Mon, 22 Nov 2021 20:40:52 +0000,
Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> wrote:
> Hi Marc, thanks for the review.
> On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@xxxxxxxxxx> wrote:
> > >
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > >
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> >
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset?
> TBH I handn't thought about NV. Looking at it from that angle, I now see my
> approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon
> entering into a guest, we change CNTVOFF before the host is done with tracing,
> so traces like 'kvm_entry' will have weird timestamps. I was just lucky that
> the hosts I was testing with use CNTPCT.

There are multiple things at play here:

- if the system is a host, the kernel will use CNTPCT. Userspace will
still use CNTVCT, and the offset is guaranteed to be 0 *when running

- if the system isn't a host (which doesn't necessarily means a
guest), CNTVCT is the only thing that is being used, and the offset
is unknown (Linux requires it to be constant across vcpus though).

So I doubt you'd get a bad timestamp on the host. It is just that you
have named your trace clock incorrectly (and Steven's idea of an
indirected clock could help here).

> I believe the solution would be to be able to force a 0 offset between
> guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init()
> imposes a non-zero one by default? I checked out the commits that introduced
> that code, but couldn't find a compelling reason. VMMs can always change it
> through KVM_REG_ARM_TIMER_CNT afterwards.

We want to minimise the chance for an observable rollover of the
virtual counter, so time starts at 0 *in the guest*. The VMM can
change the view of that time for the purpose of migration.

If you want a 0 offset, set the counter to the physical value in the
VMM (imprecise) or have a look at Oliver Upton's patches that were
allowing an offset to be specified directly. But migration, by
definition, breaks this.

> > I also wonder why we need this when userspace already has direct access to
> > that information without any extra kernel support (read the CNTVCT view of
> > the vcpu using the ONEREG API, subtract it from the host view of the counter,
> > job done).
> Well IIUC, you're at the mercy of how long it takes to return from the ONEREG
> ioctl. The results will be skewed. For some workloads, where low latency is
> key, we really need high precision traces in the order of single digit us or
> even 100s of ns. I'm not sure you'll be able to get there with that approach.

The PTP clock does exactly that from the guest PoV, with a lot more
overhead, and this results in single digit ns precision. Why isn't
that possible from userspace?



Without deviation from the norm, progress is not possible.