Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups

From: Thomas Gleixner
Date: Wed Jun 21 2017 - 11:13:01 EST


On Wed, 21 Jun 2017, kan.liang@xxxxxxxxx wrote:
>
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> +/*
> + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which
> + * can tick faster than the measured CPU Frequency due to Turbo mode.
> + * That can lead to spurious timeouts.
> + * To workaround the issue, extending the period by 3 times.
> + */
> u64 hw_nmi_get_sample_period(int watchdog_thresh)
> {
> - return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> + return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3;

The maximum turbo frequency of any given machine can be retrieved.

So why don't you simply take that ratio into account and apply it for the
machines which have those insane turbo loaders? That's not a huge effort,
can be easily backported and does not inflict this unconditially.

So what you want is:

return get_max_turbo_khz() * 1000 * watchdog_thresh;

Where get_max_turbo_khz() by default returns cpu_khz for non turbo
motors.

And instead of silently doing this it should emit a info into dmesg:

u64 period, max_khz = get_max_turbo_khz();
static int once;

period = max_khz * 1000 * watchdog_thresh;

if (max_khz != cpu_khz && !once) {
unsigned int msec = period / cpu_khz;

once = 1;
pr_info("Adjusted watchdog threshold to %u.%04u sec\n",
msec / 1000, msec % 1000);
}

return period;

Hmm?

Thanks,

tglx