Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

From: Andy Lutomirski
Date: Wed Oct 03 2018 - 12:18:39 EST


On Wed, Oct 3, 2018 at 8:10 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Wed, 3 Oct 2018, Andy Lutomirski wrote:
> > > On Oct 3, 2018, at 5:01 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
> > > Not all Hyper-V hosts support reenlightenment notifications (and, if I'm
> > > not mistaken, you need to enable nesting for the VM to get the feature -
> > > and most VMs don't have this) so I think we'll have to keep Hyper-V
> > > vclock for the time being.
> > >
> > But this does suggest that the correct way to pass a clock through to an
> > L2 guest where L0 is HV is to make L1 use the âtscâ clock and L2 use
> > kvmclock (or something newer and better). This would require adding
> > support for atomic frequency changes all the way through the timekeeping
> > and arch code.
> >
> > John, tglx, would that be okay or crazy?
>
> Not sure what you mean. I think I lost you somewhere on the way.
>

What I mean is: currently we have a clocksource called
""hyperv_clocksource_tsc_page". Reading it does:

static u64 read_hv_clock_tsc(struct clocksource *arg)
{
u64 current_tick = hv_read_tsc_page(tsc_pg);

if (current_tick == U64_MAX)
rdmsrl(HV_X64_MSR_TIME_REF_COUNT, current_tick);

return current_tick;
}

>From Vitaly's email, it sounds like, on most (all?) hyperv systems
with nesting enabled, this clock is better behaved than it appears.
It sounds like the read behavior is that current_tick will never be
U64_MAX -- instead, the clock always works and, more importantly, the
actual scaling factor and offset only change observably on *guest*
request.

So why don't we we improve the actual "tsc" clocksource to understand
this? ISTM the best model would be where the
__clocksource_update_freq_xyz() mechanism gets called so we can use it
like this:

clocksource_begin_update();
clocksource_update_mult_shift();
tell_hv_that_we_reenlightened();
clocksource_end_update();

Where clocksource_begin_update() bumps the seqcount for the vDSO and
takes all the locks, clocksource_update_mult_shift() updates
everything, and clocksource_end_update() makes the updated parameters
usable.

(AFAICT there are currently no clocksources at all in the entire
kernel that update their frequency on the fly using
__clocksource_update_xyz(). Unless I'm missing something, the x86 tsc
cpufreq hooks don't call into the core timekeeping at all, so I'm
assuming that the tsc clocksource is just unusable as a clocksource on
systems that change its frequency.)

Or we could keep the hyperv_clocksource_tsc_page clocksource but make
it use VCLOCK_TSC and a similar update mechanism.

I don't personally want to do this, because the timekeeping code is
subtle and I'm unfamiliar with it. And I don't have *that* many spare
cycles :)