Re: [PATCH v2] NMI: fix NMI period is not correct when cpu frequencychanges issue.

From: Don Zickus
Date: Tue Apr 23 2013 - 14:15:28 EST


On Mon, Apr 22, 2013 at 10:37:36PM +0200, Peter Zijlstra wrote:
> On Mon, 2013-04-22 at 00:50 +0000, Pan, Zhenjie wrote:
> > This make watchdog reset happen before hard lockup detect.
>
> Doesn't your watchdog trigger an NMI you can use to print the panic?
>
> ISTR some people (hi Don!) spending quite a lot of time to make this
> work for some other platforms.
>
> IIRC those things would fire an NMI at some point and then hard-reset
> the machine not much later.. the difficulty was detecting this
> 'unclaimed' nmi and allowing drivers to register for it.
>
> NMI_UNKNOWN and unknown_nmi_panic are the result of that.

I think you are confusing the hard lockup detector watchdog (which uses
the perf counters) with a physical hardware watchdog (which just resets
the cpu if not kicked frequently; ie
drivers/watchdog/intel_scu_watchdog.c).

I believe what Zhenjie's problem is the hard lockup detector (ie
nmi_watchdog) becomes useless because sometimes it can correctly fire
before the hardware watchdog expires, other times it may not.

In order for the hard lockup detector to be useful, it should be reliable.
Today it isn't because it period inversely varies with cpu frequency.

I don't have a real issue with his patch. I was just concerned about the
frequency of the changes (10-15 times a second seems like a lot).

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/