Re: [PATCH] clocksource, prevent overflow in clocksource_cyc2ns

From: John Stultz
Date: Wed Apr 04 2012 - 21:08:53 EST


On 04/04/2012 11:33 AM, Prarit Bhargava wrote:
One idea might be to replace the cyc2ns w/ mult_frac in only the watchdog code.
I need to think on that some more (and maybe have you provide some debug output)
to really understand how that's solving the issue for you, but it would be able
to be done w/o affecting the other assumptions of the timekeeping core.

Hey John,

After reading the initial part of your reply I was thinking about calling
mult_frac() directly from the watchdog code as well.

Here's some debug output I cobbled together to get an idea of how quickly the
overflow was happening.

[ 5.435323] clocksource_watchdog: {0} cs tsc csfirst 227349443638728 mask
0xFFFFFFFFFFFFFFFF mult 797281036 shift 31
[ 5.444930] clocksource_watchdog: {0} wd hpet wdfirst 78332535 mask
0xFFFFFFFF mult 292935555 shift 22

These, of course, are just the basic data from the clocksources tsc and hpet.

If I'm doing the math right, these are ~2.7 Ghz cpus?

So what kernel version are you using?

In trying to reproduce this locally against Linus' HEAD on a much smaller system (single core + HT 1.6Ghz), I got:
[ 6.611366] clocksource_watchdog: {0} cs tsc csfirst 36177888648 mask ffffffffffffffff mult 10485747 shift 24
[ 6.611596] clocksource_watchdog: {0} wd hpet wdfirst 169168400 mask ffffffff mult 2684354560 shift 26

Note the smaller shift values. Not too long ago the shift calculation was adjusted to allow for longer periods between interrupts, so I suspect you're on an older kernel.

Further, using your debug patch on my system, it was well beyond 10 minutes before the debug overflow occurred. And similarly I couldn't trip the watchdog trigger using sysrq-t (but again, only two threads here, so not nearly as much data to print as you have).

Could you verify that the issue you're seeing is still is present w/ current mainline? Please don't take this as me dismissing your problem! As I mentioned earlier there are some known issues w/ the clocksource watchdog code. But I want to narrow down if you're problem is currently present in mainline or only in older kernels, as that will help us find the proper fix.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/