Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

From: Andi Kleen
Date: Wed Jun 21 2017 - 13:07:46 EST


On Wed, Jun 21, 2017 at 05:12:06PM +0200, Thomas Gleixner wrote:
> On Wed, 21 Jun 2017, kan.liang@xxxxxxxxx wrote:
> >
> > #ifdef CONFIG_HARDLOCKUP_DETECTOR
> > +/*
> > + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which
> > + * can tick faster than the measured CPU Frequency due to Turbo mode.
> > + * That can lead to spurious timeouts.
> > + * To workaround the issue, extending the period by 3 times.
> > + */
> > u64 hw_nmi_get_sample_period(int watchdog_thresh)
> > {
> > - return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> > + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3;
>
> The maximum turbo frequency of any given machine can be retrieved.

Not reliably, e.g. not in virtualization. Also it would require
model specific checks, so as soon as you have a new model and an
old kernel it could still randomly fail.

-Andi