Re: [RFC PATCH v3] Fix: clocksource watchdog marks TSC unstable on guest VM

From: Thomas Gleixner
Date: Tue Sep 08 2015 - 11:08:47 EST


On Tue, 8 Sep 2015, Mathieu Desnoyers wrote:
> Introduce WATCHDOG_RETRY to bound the number of retry (in the
> unlikely event of a bogus clock source for wdnow). If the
> number of retry has been reached, disable the watchdog timer.

This does not make any sense at all. Why would the clocksource be
bogus? I rather say, that the whole idea of trying to watchdog the TSC
in a VM is bogus.

There is no guarantee, that the readout of the TSC and the watchdog is
not disturbed by VM scheduling. Aside of that, the HPET emulation goes
all the way back into qemu user land and the implementation itself
does not make me more confident. Be happy that we don't support 64bit
HPET in the kernel as that emulation code is completely broken.

I really have to ask the question WHY we actually do this. There is
absolutely no point at all.

The TSC watchdog is there to catch a few issues with the TSC

1) Frequency changing behind the kernels back

2) SMM driven power safe state 'features' which cause the TSC to
stop

3) SMM fiddling with the TSC

4) TSC drifting apart on multi socket systems

#1 Is completely irrelevant for KVM as all machines which have
hardware virtualization have a frequency constant TSC

#2 Is irrelevant for KVM as well, because the machine does not go
into deep idle states while the guest is running.

#3/#4 That are the only relevant issues, but there is absolutely no
need to do this detection in the guest.

We already have a TSC sanity check on the host. So instead of adding
horrible hackery and magic detection, shutoff, retry mechanisms, we
can simply let the guest know, that the TSC has been buggered.

On paravirt kernels we can do that today and AFAICT the
pvclock/kvmclock code has enough magic to deal with all the oddities
already.

For non paravirt kernels which can read the TSC directly, we'd need a
way to transport that information. A simple mechanism would be to
query an emulated MSR from the watchdog which tells the guest the
state of affairs on the host side. That would be a sensible and
minimal invasive change on both host and guests.

Thoughts?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/