Re: [patch 3/3] PTP: add kvm PTP driver

From: Marcelo Tosatti
Date: Fri Jan 20 2017 - 05:32:41 EST


On Wed, Jan 18, 2017 at 04:20:33PM +0100, Radim Krcmar wrote:
> 2017-01-18 12:53-0200, Marcelo Tosatti:
> > GOn Wed, Jan 18, 2017 at 12:37:25PM -0200, Marcelo Tosatti wrote:
> > > On Wed, Jan 18, 2017 at 01:46:58PM +0100, Paolo Bonzini wrote:
> > > >
> > > >
> > > > On 18/01/2017 13:24, Marcelo Tosatti wrote:
> > > > > On Wed, Jan 18, 2017 at 10:17:38AM -0200, Marcelo Tosatti wrote:
> > > > >> On Tue, Jan 17, 2017 at 04:36:21PM +0100, Radim Krcmar wrote:
> > > > >>> 2017-01-17 09:30-0200, Marcelo Tosatti:
> > > > >>>> On Tue, Jan 17, 2017 at 09:03:27AM +0100, Miroslav Lichvar wrote:
> > > > >>>>> Users of the PTP_SYS_OFFSET ioctl assume that (ts[0]+ts[2])/2
> > > > >>>>> corresponds to ts[1], (ts[2]+ts[4])/2 corresponds to ts[3], and so on.
> > > > >>>>>
> > > > >>>>> ts[1] ts[3]
> > > > >>>>> Host time ---------+---------+........
> > > > >>>>> | |
> > > > >>>>> | |
> > > > >>>>> Guest time ----+---------+---------+......
> > > > >>>>> ts[0] ts[2] ts[4]
> > > > >>>
> > > > >>> KVM PTP delay moves host ts[i] to be close to guest ts[i+1] and makes
> > > > >>> the offset very consistent, so the graph would look like:
> > > > >>>
> > > > >>> ts[1] ts[3]
> > > > >>> Host time -------------+---------+........
> > > > >>> | |
> > > > >>> | |
> > > > >>> Guest time ----+---------+---------+......
> > > > >>> ts[0] ts[2] ts[4]
> > > > >>>
> > > > >>> which doesn't sound good if users assume that the host reading is in the
> > > > >>> middle -- the guest time would be ahead of the host time.
> > > > >>
> > > > >> Testcase: run a guest and a loop sending SIGUSR1 to vcpu0 (emulating
> > > > >> intense interrupts). Follows results:
> > > > >>
> > > > >> Without TSC delta calculation:
> > > > >> =============================
> > > > >>
> > > > >> #* PHC0 0 3 377 2 -99ns[ +206ns] +/- 116ns
> > > > >> #* PHC0 0 3 377 8 +202ns[ +249ns] +/- 111ns
> > > > >> #* PHC0 0 3 377 8 -213ns[ +683ns] +/- 88ns
> > > > >> #* PHC0 0 3 377 6 +77ns[ +319ns] +/- 56ns
> > > > >> #* PHC0 0 3 377 4 -771ns[-1029ns] +/- 93ns
> > > > >> #* PHC0 0 3 377 10 -49ns[ -58ns] +/- 121ns
> > > > >> #* PHC0 0 3 377 9 +562ns[ +703ns] +/- 107ns
> > > > >> #* PHC0 0 3 377 6 -2ns[ -3ns] +/- 94ns
> > > > >> #* PHC0 0 3 377 4 +451ns[ +494ns] +/- 138ns
> > > > >> #* PHC0 0 3 377 11 -67ns[ -74ns] +/- 113ns
> > > > >> #* PHC0 0 3 377 8 +244ns[ +264ns] +/- 119ns
> > > > >> #* PHC0 0 3 377 7 -696ns[ -890ns] +/- 89ns
> > > > >> #* PHC0 0 3 377 4 +468ns[ +560ns] +/- 110ns
> > > > >> #* PHC0 0 3 377 11 -310ns[ -430ns] +/- 72ns
> > > > >> #* PHC0 0 3 377 9 +189ns[ +298ns] +/- 54ns
> > > > >> #* PHC0 0 3 377 7 +594ns[ +473ns] +/- 96ns
> > > > >> #* PHC0 0 3 377 5 +151ns[ +280ns] +/- 71ns
> > > > >> #* PHC0 0 3 377 10 -590ns[ -696ns] +/- 94ns
> > > > >> #* PHC0 0 3 377 8 +415ns[ +526ns] +/- 74ns
> > > > >> #* PHC0 0 3 377 6 +1381ns[+1469ns] +/- 101ns
> > > > >> #* PHC0 0 3 377 4 +571ns[+1304ns] +/- 54ns
> > > > >> #* PHC0 0 3 377 8 -5ns[ +71ns] +/- 139ns
> > > > >> #* PHC0 0 3 377 7 -247ns[ -502ns] +/- 69ns
> > > > >> #* PHC0 0 3 377 5 -283ns[ +879ns] +/- 73ns
> > > > >> #* PHC0 0 3 377 3 +148ns[ -109ns] +/- 61ns
> > > > >>
> > > > >> With TSC delta calculation:
> > > > >> ============================
> > > > >>
> > > > >> #* PHC0 0 3 377 7 +379ns[ +432ns] +/- 53ns
> > > > >> #* PHC0 0 3 377 9 +106ns[ +420ns] +/- 42ns
> > > > >> #* PHC0 0 3 377 7 -58ns[ -136ns] +/- 62ns
> > > > >> #* PHC0 0 3 377 12 +93ns[ -38ns] +/- 64ns
> > > > >> #* PHC0 0 3 377 8 +84ns[ +107ns] +/- 69ns
> > > > >> #* PHC0 0 3 377 3 -76ns[ -103ns] +/- 52ns
> > > > >> #* PHC0 0 3 377 7 +52ns[ +63ns] +/- 50ns
> > > > >> #* PHC0 0 3 377 11 +29ns[ +31ns] +/- 70ns
> > > > >> #* PHC0 0 3 377 7 -47ns[ -56ns] +/- 42ns
> > > > >> #* PHC0 0 3 377 10 -35ns[ -42ns] +/- 33ns
> > > > >> #* PHC0 0 3 377 7 -32ns[ -34ns] +/- 42ns
> > > > >> #* PHC0 0 3 377 11 -172ns[ -173ns] +/- 118ns
> > > > >> #* PHC0 0 3 377 6 +65ns[ +76ns] +/- 23ns
> > > > >> #* PHC0 0 3 377 9 +18ns[ +23ns] +/- 37ns
> > > > >> #* PHC0 0 3 377 6 +41ns[ -60ns] +/- 30ns
> > > > >> #* PHC0 0 3 377 10 +39ns[ +183ns] +/- 42ns
> > > > >> #* PHC0 0 3 377 6 +50ns[ +102ns] +/- 86ns
> > > > >> #* PHC0 0 3 377 11 +50ns[ +75ns] +/- 52ns
> > > > >> #* PHC0 0 3 377 6 +50ns[ +116ns] +/- 100ns
> > > > >> #* PHC0 0 3 377 10 +46ns[ +65ns] +/- 79ns
> > > > >> #* PHC0 0 3 377 7 -38ns[ -51ns] +/- 29ns
> > > > >> #* PHC0 0 3 377 10 -11ns[ -12ns] +/- 32ns
> > > > >> #* PHC0 0 3 377 7 -31ns[ -32ns] +/- 99ns
> > > > >> #* PHC0 0 3 377 10 +222ns[ +238ns] +/- 58ns
> > > > >> #* PHC0 0 3 377 6 +185ns[ +207ns] +/- 39ns
> > > > >> #* PHC0 0 3 377 10 -392ns[ -394ns] +/- 118ns
> > > > >> #* PHC0 0 3 377 6 -9ns[ -50ns] +/- 35ns
> > > > >> #* PHC0 0 3 377 10 -346ns[ -355ns] +/- 111ns
> > > > >>
> > > > >>
> > > > >> Do you still want to drop it in favour of simplicity?
> > > > >
> > > > > This is the output of "chronyc sources". See section "Time sources"
> > > > > of https://chrony.tuxfamily.org/doc/2.4/chronyc.html.
> > > >
> > > > It's just that it's not obvious why you get better results with biased
> > > > host timestamps. What makes the biased host timestamp more precise?
> > > >
> > > > I'd rather use PTP_SYS_OFFSET_PRECISE instead, but unfortunately chrony
> > > > does not support it---but I would still prefer you to support
> > > > PTP_SYS_OFFSET_PRECISE as well.
> > >
> > > A single TSC read could be used to implement the PRECISE ioctl, but if
> > > a timer interrupt takes place on either the host or the guest, and that
> > > timer interrupt "adds" the TSC delta to xtime.nsec/xtime.sec, then that
> > > single TSC read cannot be used.
> > >
> > > So you would have to stop timer interrupts (in guest and host) for the duration of the
> > > PRECISE ioctl in the guest to avoid that situation, which seems a bit
> > > overkill to me.
> > >
> > > Any other ideas?
> >
> > Could have a hypercall that disables host timer interrupts for
> > a specified amount of time... But that does not scale with multiple VMs.
>
> No need to disable interrupts on guest nor host as both protect the time
> by a seqlock.

Still not scalable with multiple VMs... so need a different solution.

> Do timer interrupts always advance the time base?

No only when a full "cycle_interval" has been accumulated.

> (We could disable guest interrupts if they were too disruptive because
> of the long hypercall, but the host is fine in any case.)

VMs can harm host interrupt processing, don't see that as being OK
(even when rate limited does scale to high number of VMs).

> The guest protects tsc->sec/nsec by tk_core.seq, so we can make sure
> that the guest computed sec/nsec from the host tsc read from hypercall
> is consistent.

Doh, of course.

> The {sec @ tsc, nsec @ tsc, tsc} triple provided by the host is also
> correct. (PTP_SYS_OFFSET_PRECISE can't have any problems with the host
> that PTP_SYS_OFFSET wouldn't also have.)

Yep, should work, lets try.