Re: [RFC PATCH v3] Fix: clocksource watchdog marks TSC unstable on guest VM

From: Shaohua Li
Date: Wed Sep 09 2015 - 11:43:51 EST


On Wed, Sep 09, 2015 at 11:51:43AM +0200, Thomas Gleixner wrote:
> On Tue, 8 Sep 2015, Shaohua Li wrote:
> > On Tue, Sep 08, 2015 at 05:08:03PM +0200, Thomas Gleixner wrote:
> > > For non paravirt kernels which can read the TSC directly, we'd need a
> > > way to transport that information. A simple mechanism would be to
> > > query an emulated MSR from the watchdog which tells the guest the
> > > state of affairs on the host side. That would be a sensible and
> > > minimal invasive change on both host and guests.
> >
> > This will require every hypervisor supports the MSR, so not a solution
> > we can expect immediately.
>
> I know.
>
> > I'm wondering why we can't just make the watchdog better to detect this
> > watchdog wrap.
>
> Again, I'm not opposed to make it better. I'm just trying to prevent
> making the watchdog a total mess for no reason.
>
> > It can happen in physical machine as I said before, but I
> > can't find a simple way to trigger it, so it's not very convincing. But
> > the watchdog doesn't work for specific environment (for exmaple, a bogus
> > hardware doesn't responsond for some time) for sure, we shouldn't assume
> > the world is perfect.
>
> Sigh. If the damned hardware blocks long enough to wreckage the
> watchdog then we have more serious problems than that.

There is difference. If hardware blocks, we can choose reset the
hardware or we can just ignore it if it's a serial console or netconsole
(these are what happend in our side) for example. These impact the
system very little. But if HPET is the clocksource, the performance of
the system will be quite poor and makes the whole system useless. There
is no method to reset the clocksource to TSC. If there is a reset
mechanism, it's fine too.

> Can you please stop this handwaving and provide some proper proof for
> your arguments? I'm really tired of this.

I'm sorry I can't provide a simple way to trigger it in real hardware,
but it's not hard to trigger this issue in kvm. Just make your host busy
and keep rebooting your virtual machine, you will find it.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/